ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters

被引:0
|
作者
Fang Lin
Yi Liu
Yayu Guo
Depei Qian
机构
[1] Beihang University,School of Computer Science and Engineering
来源
关键词
High-performance computing; Emulation system; Performance tuning; Debugging; Large-scale parallel programs;
D O I
暂无
中图分类号
学科分类号
摘要
Continuous scaling-up of high-performance computing systems has brought challenges to the debugging and tuning of large-scale parallel programs. Firstly, to locate bugs in a program or tune its performance, programmer often needs to execute the program in a specified scale repeatedly, which consumes massive resources; secondly, due to the extensively used job scheduling systems, programmers can only submit their programs as jobs and cannot interact with them, which restricts debugging efficiency and flexibility. To address these challenges, this paper proposes an emulation system that supports debugging and tuning of large-scale parallel programs by executing parallel programs in the desired scale on a small cluster. The program is firstly executed in the desired scale on the target HPC system to record necessary information; then, programmers can choose and re-execute a subset of processes of the program repeatedly on a small cluster, during which the emulation system controls the execution of the processes, and programmers can debug their programs by attaching tools to the selected processes. Moreover, our system supports popular CPU+GPU heterogeneous architecture. The system is evaluated on a small cluster, while a 1000-node system is used as the target HPC system; experimental results demonstrate the accuracy and efficiency of emulation-execution.
引用
收藏
页码:1635 / 1666
页数:31
相关论文
共 50 条
  • [41] OpenAirInterface Large-Scale Wireless Emulation Platform and Methodology
    Bilel, Ben Romdhanne
    Navid, Nikaein
    Raymond, Knopp
    Christian, Bonnet
    [J]. PM2HW2N 11: PROCEEDINGS OF THE SIXTH ACM INTERNATIONAL WORKSHOP ON PERFORMANCE MONITORING, MEASUREMENT, AND EVALUATION OF HETEROGENEOUS WIRELESS AND WIRED NETWORKS, 2011, : 109 - 112
  • [42] Synchronization of small-scale seismic clusters reveals large-scale plate deformation
    Hayrullah Karabulut
    Michel Bouchon
    Jean Schmittbuhl
    [J]. Earth, Planets and Space, 74
  • [43] Synchronization of small-scale seismic clusters reveals large-scale plate deformation
    Karabulut, Hayrullah
    Bouchon, Michel
    Schmittbuhl, Jean
    [J]. EARTH PLANETS AND SPACE, 2022, 74 (01):
  • [44] A Large-Scale SUMO-Based Emulation Platform
    Griggs, Wynita M.
    Ordonez-Hurtado, Rodrigo H.
    Crisostomi, Emanuele
    Haeusler, Florian
    Massow, Kay
    Shorten, Robert N.
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2015, 16 (06) : 3050 - 3059
  • [45] Large-scale Network Protocol Emulation on Commodity Cloud
    Dutta, Anirup
    Gnawali, Omprakash
    [J]. 2014 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2014), 2014, : 1114 - 1119
  • [46] Parallel simulation of large-scale parallel applications
    Bagrodia, R
    Deelman, E
    Phan, T
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (01): : 3 - 12
  • [47] Optimistic parallel simulation of a large-scale view storage system
    Yaun, G
    Carothers, CD
    Adali, S
    Spooner, D
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2003, 19 (04): : 479 - 492
  • [48] Parallel simulation of a large-scale aerospace system in a multicomputer environment
    Wells, BE
    Ricks, KG
    Weir, JM
    [J]. IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 1997, 33 (02) : 507 - 522
  • [49] WISPAC - PARALLEL ARRAY COMPUTER FOR LARGE-SCALE SYSTEM SIMULATION
    CYRE, WR
    DAVIS, CJ
    FRANK, AA
    JEDYNAK, L
    REDMOND, MJ
    RIDEOUT, VC
    [J]. SIMULATION, 1977, 29 (05) : 165 - 172
  • [50] Optimistic parallel simulation of a large-scale view storage system
    Yaun, G
    Carothers, CD
    Adali, S
    Spooner, D
    [J]. WSC'01: PROCEEDINGS OF THE 2001 WINTER SIMULATION CONFERENCE, VOLS 1 AND 2, 2001, : 1363 - 1371