Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

被引:1
|
作者
Cheng, Peng [1 ,2 ]
Lu, Yutong [3 ]
Du, Yunfei [3 ]
Chen, Zhiguang [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] State Key Lab High Performance Comp, Changsha, Peoples R China
[3] Natl Supercomp Ctr Guangzhou NSCC GZ, Guangzhou, Peoples R China
来源
基金
国家重点研发计划;
关键词
High performance computing; Big data; Convergence; File system; Hadoop;
D O I
10.1007/978-3-319-69953-0_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.
引用
收藏
页码:90 / 106
页数:17
相关论文
共 50 条
  • [1] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
    Ketu, Shwet
    Mishra, Pramod Kumar
    Agarwal, Sonali
    [J]. COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
  • [2] Big Data Analytics Frameworks
    Chandarana, Parth
    Vijayalakshmi, M.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, COMMUNICATION AND INFORMATION TECHNOLOGY APPLICATIONS (CSCITA), 2014, : 430 - 434
  • [3] HIGH-PERFORMANCE COMPUTING BASED BIG DATA ANALYTICS FOR SMART MANUFACTURING
    Yang, Yuhang
    Cai, Y. Dora
    Lu, Qiyue
    Zhang, Yifang
    Koric, Seid
    Shao, Chenhui
    [J]. PROCEEDINGS OF THE ASME 13TH INTERNATIONAL MANUFACTURING SCIENCE AND ENGINEERING CONFERENCE, 2018, VOL 3, 2018,
  • [4] Optimized load balancing in high-performance computing for big data analytics
    Mirtaheri, Seyedeh Leili
    Grandinetti, Lucio
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (16):
  • [5] Resilient Cities and Urban Analytics: The Role of Big Data and High Performance Pervasive Computing
    Marathe, Madhav
    [J]. COMPANION PROCEEDINGS OF THE SECOND ACM IKDD CONFERENCE ON DATA SCIENCES (CODS), 2015,
  • [6] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431
  • [7] High-Performance Computing for Data Analytics
    Perrin, Dimitri
    Bezbradica, Marija
    Crane, Martin
    Ruskin, Heather J.
    Duhamel, Christophe
    [J]. 2012 IEEE/ACM 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2012, : 234 - 242
  • [8] Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks
    Marcu, Ovidiu-Cristian
    Costan, Alexandra
    Antoniu, Gabriel
    Perez-Hernandez, Maria S.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 433 - 442
  • [9] Challenges in High Performance Big Data Frameworks
    Papadopoulos, Alessandro V.
    Maggio, Martina
    [J]. PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 153 - 156
  • [10] Nomadic Computing for Big Data Analytics
    Yu, Hsiang-Fu
    Hsieh, Cho-Jui
    Yun, Hyokun
    Vishwanathan, S. V. N.
    Dhillon, Inderjit
    [J]. COMPUTER, 2016, 49 (04) : 52 - 60