Towards Dependency-Aware Cache Management for Data Analytics Applications

被引:0
|
作者
Yu, Yinghao [1 ]
Zhang, Chengliang [2 ]
Wang, Wei [2 ]
Zhang, Jun [3 ]
Ben Letaief, Khaled [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Kowloon, Dept Elect & Comp Engn, Clear Water Bay, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Kowloon, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[3] Hong Kong Polytech Univ, Kowloon, Dept Elect & Informat Engn, Hung Hom, Hong Kong, Peoples R China
关键词
Cloud computing; data analytics system; cache management; dependency-awareness; all-or-nothing caching; PERFORMANCE;
D O I
10.1109/TCC.2019.2945015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Memory caches are being used aggressively in today's data analytics systems such as Spark, Tez, and Piccolo. The significant performance impact of caches and their limited sizes call for efficient cache management in data analytics clusters. However, prevalent data analytics systems employ rather simple cache management policies-notably Least Recently Used (LRU) and Least Frequently Used (LFU)-that are oblivious to the application semantics of data dependency, expressed as directed acyclic graphs (DAGs). Without this knowledge, cache management can, at best, be performed by "guessing" the future data access patterns based on history, which frequently results in inefficient, erroneous caching with a low hit rate and a long response time. Worse still, the lack of data dependency knowledge makes it impossible to retain the all-or-nothing cache property of cluster applications, in that a compute task cannot be sped up unless all the dependent data has been kept in the main memory. In this paper, we propose a novel cache replacement policy, named Least Reference Count (LRC), which exploits the application's data dependency information to optimize the cache management. LRC keeps track of the reference count of each data block, defined as the number of dependent child blocks that have not been computed yet, and always evicts the block with the smallest reference count. Furthermore, we incorporate the all-or-nothing requirement into LRC by coordinately managing the reference counts of all the input data blocks for the same computation. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, the proposed policies well address the all-or-nothing requirement and significantly improve the cache performance. Compared with LRU and a recently proposed caching policy called MEMTUNE, LRC improves the caching performance of typical workloads in production clusters by 22 and 284 percent, respectively.
引用
收藏
页码:706 / 723
页数:18
相关论文
共 50 条
  • [1] LRC: Dependency-Aware Cache Management for Data Analytics Clusters
    Yu, Yinghao
    Wang, Wei
    Zhang, Jun
    Ben Letaief, Khaled
    [J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [2] LCRC: A Dependency-aware Cache Management Policy for Spark
    Wang, Bo
    Tang, Jie
    Zhang, Rui
    Ding, Wei
    Qi, Deyu
    [J]. 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 956 - 963
  • [3] Dependency-Aware Data Locality for MapReduce
    Fan, Xiaoyi
    Ma, Xiaoqiang
    Liu, Jiangchuan
    Li, Dan
    [J]. 2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 409 - 416
  • [4] Dependency-Aware Data Locality for MapReduce
    Ma, Xiaoqiang
    Fan, Xiaoyi
    Liu, Jiangchuan
    Li, Dan
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (03) : 667 - 679
  • [5] Dependency-aware Task Scheduling and Cache Placement in Vehicular Networks
    Zhang, Lintao
    Zhao, Caijin
    Wang, Yuanyu
    Tang, Yuliang
    Yang, Bo
    [J]. 2022 IEEE 95TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-SPRING), 2022,
  • [6] Data-Aware Cache Management for Graph Analytics
    Sharma, Neelam
    Venkitaraman, Varun
    Newton
    Kumar, Vikash
    Singhania, Shubham
    Jha, Chandan Kumar
    [J]. PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 843 - 848
  • [7] Dependency-aware Form Understanding
    Zhang, Shaokun
    Li, Yuanchun
    Yan, Weixiang
    Guo, Yao
    Chen, Xiangqun
    [J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 139 - 149
  • [8] CONFUZZIUS: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts
    Torres, Christof Ferreira
    Iannillo, Antonio Ken
    Gervais, Arthur
    State, Radu
    [J]. 2021 IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2021), 2021, : 103 - 119
  • [9] Resource-Aware Cache Management for In-Memory Data Analytics Frameworks
    Zhao, Zhengyang
    Zhang, Haitao
    Geng, Xin
    Ma, Huadong
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 364 - 371
  • [10] Dependency-aware Fault Tree Analysis
    Prohaska, Alexander
    [J]. 2021 5TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2021), 2021, : 22 - 31