Towards Dependency-Aware Cache Management for Data Analytics Applications

被引:0
|
作者
Yu, Yinghao [1 ]
Zhang, Chengliang [2 ]
Wang, Wei [2 ]
Zhang, Jun [3 ]
Ben Letaief, Khaled [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Kowloon, Dept Elect & Comp Engn, Clear Water Bay, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Kowloon, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Kowloon, Dept Elect & Informat Engn, Hung Hom, Hong Kong, Peoples R China
关键词
Cloud computing; data analytics system; cache management; dependency-awareness; all-or-nothing caching; PERFORMANCE;
D O I
10.1109/TCC.2019.2945015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Memory caches are being used aggressively in today's data analytics systems such as Spark, Tez, and Piccolo. The significant performance impact of caches and their limited sizes call for efficient cache management in data analytics clusters. However, prevalent data analytics systems employ rather simple cache management policies-notably Least Recently Used (LRU) and Least Frequently Used (LFU)-that are oblivious to the application semantics of data dependency, expressed as directed acyclic graphs (DAGs). Without this knowledge, cache management can, at best, be performed by "guessing" the future data access patterns based on history, which frequently results in inefficient, erroneous caching with a low hit rate and a long response time. Worse still, the lack of data dependency knowledge makes it impossible to retain the all-or-nothing cache property of cluster applications, in that a compute task cannot be sped up unless all the dependent data has been kept in the main memory. In this paper, we propose a novel cache replacement policy, named Least Reference Count (LRC), which exploits the application's data dependency information to optimize the cache management. LRC keeps track of the reference count of each data block, defined as the number of dependent child blocks that have not been computed yet, and always evicts the block with the smallest reference count. Furthermore, we incorporate the all-or-nothing requirement into LRC by coordinately managing the reference counts of all the input data blocks for the same computation. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, the proposed policies well address the all-or-nothing requirement and significantly improve the cache performance. Compared with LRU and a recently proposed caching policy called MEMTUNE, LRC improves the caching performance of typical workloads in production clusters by 22 and 284 percent, respectively.
引用
收藏
页码:706 / 723
页数:18
相关论文
共 50 条
  • [21] Dependency-Aware Caching for HTTP Adaptive Streaming
    Zhang, Cong
    Liu, Jiangchuan
    Chen, Fei
    Cui, Yong
    Ngai, Edith C. -H.
    [J]. 2016 DIGITAL MEDIA INDUSTRY AND ACADEMIC FORUM (DMIAF), 2016, : 89 - 93
  • [22] DAF: Dependency-Aware FaaSifier for Node.js']js Monolithic Applications
    Ristov, Sasko
    Pedratscher, Stefan
    Wallnoefer, Jakob
    Fahringer, Thomas
    [J]. IEEE SOFTWARE, 2021, 38 (01) : 48 - 53
  • [23] Task Allocation in Dependency-aware Spatial Crowdsourcing
    Ni, Wangze
    Cheng, Peng
    Chen, Lei
    Lin, Xuemin
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 985 - 996
  • [24] Dependency-aware Maintenance for Dynamic Grid Services
    Jin, Hai
    Qi, Li
    Wu, Song
    Luo, Yaqin
    Dai, Jie
    [J]. 2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 532 - 539
  • [25] Dependency-aware unequal erasure protection codes
    BOUABDALLAH Amine
    LACAN Jér?me
    [J]. Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2006, (S1) : 27 - 33
  • [26] Dependency-aware action planning for smart home
    Kim, Jongjin
    Lee, Jaeri
    Yun, Jeongin
    Kang, U.
    [J]. PLOS ONE, 2024, 19 (06):
  • [27] Dependency-Aware Distributed Video Transcoding in the Cloud
    Zakerinasab, Mohammad Reza
    Wang, Mea
    [J]. 40TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2015), 2015, : 245 - 252
  • [28] Dependency-aware deep generative models for multitasking analysis of spatial omics data
    Tian, Tian
    Zhang, Jie
    Lin, Xiang
    Wei, Zhi
    Hakonarson, Hakon
    [J]. NATURE METHODS, 2024, 21 (08) : 1501 - 1513
  • [29] Dependency-Aware Network Adaptive Scheduling of Data-Intensive Parallel Jobs
    Wang, Shaoqi
    Chen, Wei
    Zhou, Xiaobo
    Zhang, Liqiang
    Wang, Yin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 515 - 529
  • [30] Toward Dependency-Aware Live Virtual Machine Migration
    Nocentino, Anthony
    Ruth, Paul M.
    [J]. THIRD INTERNATIONAL WORKSHOP ON VIRTUALIZATION TECHNOLOGIES IN DISTRIBUTED COMPUTING (VTDC-09), 2009, : 59 - 66