Optimizing Irregular Shared-Memory Applications for Clusters

被引:0
|
作者
Min, Seung-Jai [1 ]
Eigenmann, Rudolf [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
关键词
Compiler Analysis; Runtime Techniques; OpenMP; MPI; Irregular Data Accesses; Performance;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Irregular applications pose challenges in optimizing communication, due to the difficulty of analyzing irregular data accesses accurately and efficiently. This challenge is especially big when translating irregular shared-memory applications to message-passing form for clusters. The lack of effective irregular data analysis in the translation system results in unnecessary or redundant communication, which limits application scalability. In this paper, we present a Lean Distributed Shared Memory (LDSM) system, which features a fast and accurate irregular data access (IDA) analysis. The analysis uses a region-based diff method and makes use of a runtime library that is optimized for irregular applications. We describe three optimizations that improve the LDSM system performance. A parallel array reduction transformation reduces overheads in the analysis. A packed communication optimization and a differential communication optimization effectively eliminate unnecessary and redundant messages. We evaluate the performance of the optimized LDSM system on a set of representative irregular benchmarks. The optimized LDSM executes irregular applications on average 45% faster than the hand-tuned MPI applications.
引用
收藏
页码:256 / 265
页数:10
相关论文
共 50 条
  • [31] Exploiting distributed-memory and shared-memory parallelism on clusters of SMPs with data parallel programs
    Benkner, S
    Sipkova, V
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2003, 31 (01) : 3 - 19
  • [32] Tool-assisted optimization of shared-memory accesses in UPC applications
    Cong, Guojing
    Wen, Huifang
    Murata, Hiroki
    Negishi, Yasushi
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 104 - 111
  • [33] Fast and Accurate Statistical Simulation of Shared-Memory Applications on Multicore Systems
    Jiang, Fan
    Maeda, Rafael K., V
    Feng, Jun
    Chen, Shixi
    Chen, Lin
    Li, Xiao
    Xu, Jiang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2455 - 2469
  • [34] MEMORY ACCESS DEPENDENCIES IN SHARED-MEMORY MULTIPROCESSORS
    DUBOIS, M
    SCHEURICH, C
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1990, 16 (06) : 660 - 673
  • [35] Memory latency in distributed shared-memory multiprocessors
    Motlagh, BS
    DeMara, RF
    PROCEEDINGS IEEE SOUTHEASTCON '98: ENGINEERING FOR A NEW ERA, 1998, : 134 - 137
  • [36] HIGH-PERFORMANCE UNIVERSAL HASHING, WITH APPLICATIONS TO SHARED-MEMORY SIMULATIONS
    DIETZFELBINGER, M
    HEIDE, FMAD
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 594 : 250 - 269
  • [37] Performance analysis tools for parallel Java applications on shared-memory systems
    European Center for Parallelism of Barcelona , Computer Architecture Department, Technical University of Catalonia, Campus Nord UPC, C/ Jordi Girona 1-3, Mòdul C6, Barcelona
    E-08034, Spain
    不详
    EH9 3JZ, United Kingdom
    Proc. Int. Conf. Parallel Process., 1600, (357-364):
  • [38] MEMORY MANAGEMENT FOR PARALLEL TASKS IN SHARED-MEMORY
    LANGENDOEN, KG
    MULLER, HL
    VREE, WG
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 637 : 165 - 178
  • [39] On the coexistence of shared-memory and message-passing in the programming of parallel applications
    Cordsen, J
    Schroder-Preikschat, W
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1997, 1225 : 718 - 727
  • [40] Performance analysis of shared-memory parallel applications using performance properties
    Fürlinger, K
    Gerndt, M
    HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2005, 3726 : 595 - 604