BigCache for Big-data Systems

被引:0
|
作者
Roger, Michel Angelo [1 ]
Xu, Yiqi [1 ]
Zhao, Ming [1 ]
机构
[1] Florida Int Univ, Miami, FL 33199 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big-data systems are increasingly used in many disciplines for important tasks such as knowledge discovery and decision making by processing large volumes of data. Big-data systems rely on hard-disk drive (HDD) based storage to provide the necessary capacity. However, as big-data applications grow rapidly more diverse and demanding, HDD storage becomes insufficient to satisfy their performance requirements. Emerging solid-state drives (SSDs) promise great IO performance that can be exploited by big-data applications, but they still face serious limitations in capacity, cost, and endurance and therefore must be strategically incorporated into big-data systems. This paper presents BigCache, an SSD-based distributed caching layer for big-data systems. It is designed to be seamlessly integrated with existing big-data systems and transparently accelerate IOs for diverse big-data applications. The management of the distributed SSD caches in BigCache is coordinated with the job management of big-data systems in order to support cache-locality-driven job scheduling. BigCache is prototyped in Hadoop to provide caching upon HDFS for MapReduce applications. It is evaluated using typical MapReduce applications, and the results show that BigCache reduces the runtime of WordCount by 38% and the runtime of TeraSort by 52%. The results also show that BigCache is able to achieve significant speedup by caching only partial input for the benchmarks, owing to its ability to cache partial input and its replacement policy that recognizes application access patterns.
引用
收藏
页码:189 / 194
页数:6
相关论文
共 50 条
  • [1] Failure Analysis and Prediction for Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2017, 10 (06) : 984 - 998
  • [2] Understanding Unsuccessful Executions in Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 741 - 744
  • [3] Enabling Scientific Data Storage and Processing on Big-data Systems
    Biookaghazadeh, Saman
    Xu, Yiqi
    Zhou, Shujia
    Zhao, Ming
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1978 - 1984
  • [4] Big-Data Applications as Self-Adaptive Systems of Systems
    Baresi, Luciano
    Quattrocchi, Giovanni
    Denaro, Giovanni
    [J]. 2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2019), 2019, : 155 - 162
  • [5] Big-Data Visualization
    Keim, Daniel
    Qu, Huamin
    Ma, Kwan-Liu
    [J]. IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) : 20 - 21
  • [6] Preemption-aware planning on Big-Data Systems
    Rabozzi, Marco
    Mazzucchelli, Matteo
    Cordone, Roberto
    Fumarola, Giovanni Matteo
    Santambrogio, Marco D.
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (08) : 399 - 400
  • [7] Fast algorithm for relaxation processes in big-data systems
    Hwang, S.
    Lee, D. -S.
    Kahng, B.
    [J]. PHYSICAL REVIEW E, 2014, 90 (04)
  • [8] The Linear Estimation Problem and Information in Big-Data Systems
    P. V. Golubtsov
    [J]. Automatic Documentation and Mathematical Linguistics, 2018, 52 (2) : 73 - 79
  • [9] The Linear Estimation Problem and Information in Big-Data Systems
    Golubtsov, P., V
    [J]. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2018, 52 (02) : 73 - 79
  • [10] Online Data Deduplication for In-Memory Big-Data Analytic Systems
    Sun, Yushi
    Zeng, Catherine Y.
    Chung, Jaeyoon
    Huang, Zhe
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,