Improving compiler and run-time support for irregular reductions using local writes

被引:1
|
作者
Han, HS [1 ]
Tseng, CW [1 ]
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
关键词
D O I
10.1007/3-540-48319-5_12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buffers and the shared-memory interface supported by software DSMs (TreadMarks). We introduce LOCALWRITE, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates the need for buffers or synchronized writes, but may replicate computation. We investigate the impact of connectivity (node/edge ratio), locality (accesses to local data) and adaptivity (edge modifications) on their relative performance. LOCALWRITE improves performance by 50-150% compared to using replicated buffers, and can match or exceed gather/scatter for applications with low locality or high adaptivity.
引用
收藏
页码:181 / 196
页数:16
相关论文
共 50 条
  • [41] A Cascade of Checkers for Run-time Certification of Local Robustness
    Mangal, Ravi
    Pasareanu, Corina
    SOFTWARE VERIFICATION AND FORMAL METHODS FOR ML-ENABLED AUTONOMOUS SYSTEMS, FOMLAS 2022, NSV 2022, 2022, 13466 : 15 - 28
  • [42] Run-Time Middleware to Support Real-Time System Scenarios
    Goossens, Kees
    Koedam, Martijn
    Sinha, Shubhendu
    Nelson, Andrew
    Geilen, Marc
    2015 EUROPEAN CONFERENCE ON CIRCUIT THEORY AND DESIGN (ECCTD), 2015, : 444 - 447
  • [43] AMD SOC Power Management: Improving Performance/Watt Using Run-time Feedback
    Bircher, W. Lloyd
    Naffziger, Sam
    2014 IEEE PROCEEDINGS OF THE CUSTOM INTEGRATED CIRCUITS CONFERENCE (CICC), 2014,
  • [44] DYNAMEM-A Microarchitecture for Improving Memory Disambiguation at Run-Time
    王显著
    廖恒
    李三立
    Journal of Computer Science & Technology, 1996, (06) : 589 - 600
  • [45] Run-Time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
    Fu, C.
    Yang, T.
    Journal of Parallel and Distributed Computing, 42 (02):
  • [46] Run-time techniques for exploiting irregular task parallelism on distributed memory architectures
    Fu, C
    Yang, T
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 42 (02) : 143 - 156
  • [47] Reducing the Run-Time Complexity of Support Vector Data Descriptions
    Liu, Yi-Hung
    Liu, Yan-Chen
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 3075 - +
  • [48] Using run-time data for program comprehension
    Gschwind, T
    Oberleitner, J
    Pinzger, M
    IWPC 2003: 11TH IEEE INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, 2003, : 245 - 250
  • [49] Run-time Complexity Bounds Using Squeezers
    Ish-Shalom, Oren
    Itzhaky, Shachar
    Rinetzky, Noam
    Shoham, Sharon
    PROGRAMMING LANGUAGES AND SYSTEMS, ESOP 2021, 2021, 12648 : 320 - 347
  • [50] Precise Exception Support for Decoupled Run-Time Monitoring Architectures
    Deng, Daniel Y.
    Suh, G. Edward
    2011 IEEE 29TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2011, : 437 - 438