Towards Dependency-Aware Cache Management for Data Analytics Applications

被引：0

作者：

Yu, Yinghao ^{[1
]}

Zhang, Chengliang ^{[2
]}

Wang, Wei ^{[2
]}

Zhang, Jun ^{[3
]}

Ben Letaief, Khaled ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Kowloon, Dept Elect & Comp Engn, Clear Water Bay, Hong Kong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Kowloon, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China

[3] Hong Kong Polytech Univ, Kowloon, Dept Elect & Informat Engn, Hung Hom, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2022年 / 10卷 / 01期

关键词：

Cloud computing; data analytics system; cache management; dependency-awareness; all-or-nothing caching; PERFORMANCE;

D O I：

10.1109/TCC.2019.2945015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Memory caches are being used aggressively in today's data analytics systems such as Spark, Tez, and Piccolo. The significant performance impact of caches and their limited sizes call for efficient cache management in data analytics clusters. However, prevalent data analytics systems employ rather simple cache management policies-notably Least Recently Used (LRU) and Least Frequently Used (LFU)-that are oblivious to the application semantics of data dependency, expressed as directed acyclic graphs (DAGs). Without this knowledge, cache management can, at best, be performed by "guessing" the future data access patterns based on history, which frequently results in inefficient, erroneous caching with a low hit rate and a long response time. Worse still, the lack of data dependency knowledge makes it impossible to retain the all-or-nothing cache property of cluster applications, in that a compute task cannot be sped up unless all the dependent data has been kept in the main memory. In this paper, we propose a novel cache replacement policy, named Least Reference Count (LRC), which exploits the application's data dependency information to optimize the cache management. LRC keeps track of the reference count of each data block, defined as the number of dependent child blocks that have not been computed yet, and always evicts the block with the smallest reference count. Furthermore, we incorporate the all-or-nothing requirement into LRC by coordinately managing the reference counts of all the input data blocks for the same computation. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, the proposed policies well address the all-or-nothing requirement and significantly improve the cache performance. Compared with LRU and a recently proposed caching policy called MEMTUNE, LRC improves the caching performance of typical workloads in production clusters by 22 and 284 percent, respectively.

引用

页码：706 / 723

页数：18

共 50 条

[1] LRC: Dependency-Aware Cache Management for Data Analytics Clusters
Yu, Yinghao
Wang, Wei
Zhang, Jun
Ben Letaief, Khaled
[J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
[2] LCRC: A Dependency-aware Cache Management Policy for Spark
Wang, Bo
Tang, Jie
Zhang, Rui
Ding, Wei
Qi, Deyu
[J]. 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 956 - 963
[3] Dependency-Aware Data Locality for MapReduce
Fan, Xiaoyi
Ma, Xiaoqiang
Liu, Jiangchuan
Li, Dan
[J]. 2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 409 - 416
[4] Dependency-Aware Data Locality for MapReduce
Ma, Xiaoqiang
Fan, Xiaoyi
Liu, Jiangchuan
Li, Dan
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (03) : 667 - 679
[5] Dependency-aware Task Scheduling and Cache Placement in Vehicular Networks
Zhang, Lintao
Zhao, Caijin
Wang, Yuanyu
Tang, Yuliang
Yang, Bo
[J]. 2022 IEEE 95TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-SPRING), 2022,
[6] Data-Aware Cache Management for Graph Analytics
Sharma, Neelam
Venkitaraman, Varun
Newton
Kumar, Vikash
Singhania, Shubham
Jha, Chandan Kumar
[J]. PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 843 - 848
[7] Dependency-aware cache optimization and offloading strategies for intelligent transportation systems
Zhu, Sifeng
Song, Zhaowei
Huang, Changlong
Zhu, Hai
Qiao, Rui
[J]. Journal of Supercomputing, 2025, 81 (01):
[8] Dependency-aware Form Understanding
Zhang, Shaokun
Li, Yuanchun
Yan, Weixiang
Guo, Yao
Chen, Xiangqun
[J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 139 - 149
[9] Dependency-Aware Code Naturalness
Yang, Chen
Chen, Junjie
Jiang, Jiajun
Huang, Yuliang
[J]. Proceedings of the ACM on Programming Languages, 2024, 8 (OOPSLA2)
[10] CONFUZZIUS: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts
Torres, Christof Ferreira
Iannillo, Antonio Ken
Gervais, Arthur
State, Radu
[J]. 2021 IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2021), 2021, : 103 - 119

← 1 2 3 4 5 →