Accelerating big data analytics on HPC clusters using two-level storage

被引:11
|
作者
Xuan, Pengfei [1 ]
Ligon, Walter B. [2 ]
Srimani, Pradip K. [1 ]
Ge, Rong [1 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
[2] Clemson Univ, Elect & Comp Engn, Clemson, SC 29634 USA
基金
美国国家科学基金会;
关键词
Two-level storage; In-memory file system; Parallel file system; Data-intensive computing;
D O I
10.1016/j.parco.2016.08.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:18 / 34
页数:17
相关论文
共 50 条
  • [1] Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics
    Mercier, Michael
    Glesser, David
    Georgiou, Yiannis
    Richard, Olivier
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 347 - 352
  • [2] Accelerating Big Data Analytics Using FPGAs
    Neshatpour, Katayoun
    Malik, Maria
    Ghodrat, Mohammad Ali
    Homayoun, Houman
    [J]. 2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 164 - 164
  • [3] Accelerating Big Data Analytics Using Scale-up/out Heterogeneous Clusters
    Li, Zhuozhao
    Shen, Haiying
    Ward, Lee
    [J]. 2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN), 2019,
  • [4] Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-based Key-Value Store
    Islam, Nusrat Sharmin
    Shankar, Dipti
    Lu, Xiaoyi
    Wasi-ur-Rahman, Md.
    Panda, Dhabaleswar K.
    [J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 280 - 289
  • [5] Catching Failures of Failures at Big-Data Clusters: A Two-Level Neural Network Approach
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. 2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2015, : 231 - 236
  • [6] Big Data Analytics on HPC Architectures: Performance and Cost
    Xenopoulos, Peter
    Daniel, Jamison
    Matheson, Michael
    Sukumar, Sreenivas
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2295 - 2304
  • [7] Editorial: Big scientific data analytics on HPC and cloud
    Wang, Jianwu
    Yin, Junqi
    Nguyen, Mai H.
    Wang, Jingbo
    Xu, Weijia
    [J]. FRONTIERS IN BIG DATA, 2024, 7
  • [8] A Two-level Cloud Storage System Based on Asynchronous Message for Medical Image Big Data
    Li, Wei
    Feng, Chaolu
    Jin, Ci
    Chen, Qiang
    Liu, Haining
    Zhao, Dazhe
    [J]. 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 54 - 58
  • [9] A Two-Level Architecture for Data Warehousing and OLAP Over Big Data
    Dhaouadi, Asma
    Gammoudi, Mohamed Mohsen
    Hammoudi, Slimane
    [J]. VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 7182 - 7194
  • [10] A two-level formal model for Big Data processing programs
    de Souza Neto, Joao Batista
    Moreira, Anamaria Martins
    Vargas-Solar, Genoveva
    Musicante, Martin A.
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2022, 215