Towards Dependency-Aware Cache Management for Data Analytics Applications

被引：0

作者：

Yu, Yinghao ^{[1
]}

Zhang, Chengliang ^{[2
]}

Wang, Wei ^{[2
]}

Zhang, Jun ^{[3
]}

Ben Letaief, Khaled ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Kowloon, Dept Elect & Comp Engn, Clear Water Bay, Hong Kong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Kowloon, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China

[3] Hong Kong Polytech Univ, Kowloon, Dept Elect & Informat Engn, Hung Hom, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2022年 / 10卷 / 01期

关键词：

Cloud computing; data analytics system; cache management; dependency-awareness; all-or-nothing caching; PERFORMANCE;

D O I：

10.1109/TCC.2019.2945015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Memory caches are being used aggressively in today's data analytics systems such as Spark, Tez, and Piccolo. The significant performance impact of caches and their limited sizes call for efficient cache management in data analytics clusters. However, prevalent data analytics systems employ rather simple cache management policies-notably Least Recently Used (LRU) and Least Frequently Used (LFU)-that are oblivious to the application semantics of data dependency, expressed as directed acyclic graphs (DAGs). Without this knowledge, cache management can, at best, be performed by "guessing" the future data access patterns based on history, which frequently results in inefficient, erroneous caching with a low hit rate and a long response time. Worse still, the lack of data dependency knowledge makes it impossible to retain the all-or-nothing cache property of cluster applications, in that a compute task cannot be sped up unless all the dependent data has been kept in the main memory. In this paper, we propose a novel cache replacement policy, named Least Reference Count (LRC), which exploits the application's data dependency information to optimize the cache management. LRC keeps track of the reference count of each data block, defined as the number of dependent child blocks that have not been computed yet, and always evicts the block with the smallest reference count. Furthermore, we incorporate the all-or-nothing requirement into LRC by coordinately managing the reference counts of all the input data blocks for the same computation. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, the proposed policies well address the all-or-nothing requirement and significantly improve the cache performance. Compared with LRU and a recently proposed caching policy called MEMTUNE, LRC improves the caching performance of typical workloads in production clusters by 22 and 284 percent, respectively.

引用

页码：706 / 723

页数：18

共 50 条

[21] Dependency-Aware Caching for HTTP Adaptive Streaming
Zhang, Cong
Liu, Jiangchuan
Chen, Fei
Cui, Yong
Ngai, Edith C. -H.
[J]. 2016 DIGITAL MEDIA INDUSTRY AND ACADEMIC FORUM (DMIAF), 2016, : 89 - 93
[22] DAF: Dependency-Aware FaaSifier for Node.js']js Monolithic Applications
Ristov, Sasko
Pedratscher, Stefan
Wallnoefer, Jakob
Fahringer, Thomas
[J]. IEEE SOFTWARE, 2021, 38 (01) : 48 - 53
[23] Task Allocation in Dependency-aware Spatial Crowdsourcing
Ni, Wangze
Cheng, Peng
Chen, Lei
Lin, Xuemin
[J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 985 - 996
[24] Dependency-aware Maintenance for Dynamic Grid Services
Jin, Hai
Qi, Li
Wu, Song
Luo, Yaqin
Dai, Jie
[J]. 2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 532 - 539
[25] Dependency-aware unequal erasure protection codes
BOUABDALLAH Amine
LACAN Jér?me
[J]. Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2006, (S1) : 27 - 33
[26] Dependency-aware action planning for smart home
Kim, Jongjin
Lee, Jaeri
Yun, Jeongin
Kang, U.
[J]. PLOS ONE, 2024, 19 (06):
[27] Dependency-Aware Distributed Video Transcoding in the Cloud
Zakerinasab, Mohammad Reza
Wang, Mea
[J]. 40TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2015), 2015, : 245 - 252
[28] Dependency-aware deep generative models for multitasking analysis of spatial omics data
Tian, Tian
Zhang, Jie
Lin, Xiang
Wei, Zhi
Hakonarson, Hakon
[J]. NATURE METHODS, 2024, 21 (08) : 1501 - 1513
[29] Dependency-Aware Network Adaptive Scheduling of Data-Intensive Parallel Jobs
Wang, Shaoqi
Chen, Wei
Zhou, Xiaobo
Zhang, Liqiang
Wang, Yin
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 515 - 529
[30] Toward Dependency-Aware Live Virtual Machine Migration
Nocentino, Anthony
Ruth, Paul M.
[J]. THIRD INTERNATIONAL WORKSHOP ON VIRTUALIZATION TECHNOLOGIES IN DISTRIBUTED COMPUTING (VTDC-09), 2009, : 59 - 66

← 1 2 3 4 5 →