Extending Shared-Memory Computations to Multiple Distributed Nodes

被引:0
|
作者
Ahmed, Waseem [1 ]
机构
[1] King Abdulaziz Univ, Dept Comp Sci, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
关键词
GPU; OpenMP; shared memory programming; distributed programming; CUDA; MATRIX MULTIPLICATION; PERFORMANCE; PROGRAMS; OPENMP;
D O I
10.14569/IJACSA.2020.0110882
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the emergence of accelerators like GPUs, MICs and FPGAs, the availability of domain specific libraries (like MKL) and the ease of parallelization associated with CUDA and OpenMP based shared-memory programming, node-based parallelization has recently become a popular choice among developers in the field of scientific computing. This is evident from the large volume of recently published work in various domains of scientific computing, where shared-memory programming and accelerators have been used to accelerate applications. Although these approaches are suitable for small problem-sizes, there are issues that need to be addressed for them to be applicable to larger input domains. Firstly, the primary focus of these works has been to accelerate the core kernel; acceleration of input/output operations is seldom considered. Many operations in scientific computing operate on large matrices-both sparse and dense - that are read from and written to external files. These input-output operations present themselves as bottlenecks and significantly effect the overall application time. Secondly, node-based parallelization limits a developer from distributing the computation beyond a single node without him having to learn an additional programming paradigm like MPI. Thirdly, the problem size that can be effectively handled by a node is limited by the memory of the node and accelerator. In this paper, an Asynchronous Multi-node Execution (AMNE) approach is presented that uses a unique combination of the shared-file system and pseudo-replication to extend node-based algorithms to a distributed multiple node implementation with minimal changes to the original node-based code. We demonstrate this approach by applying it to GEMM, a popular kernel in dense linear algebra and show that the presented methodology significantly advances the state of art in the field of parallelization and scientific computing.
引用
收藏
页码:675 / 685
页数:11
相关论文
共 50 条
  • [41] Cachet: An adaptive cache coherence protocol for distributed shared-memory systems
    Shen, Xiaowei
    Arvind
    Rudolph, Larry
    [J]. Proceedings of the International Conference on Supercomputing, 1999, : 135 - 144
  • [42] COMPARING DISTRIBUTED-MEMORY AND VIRTUAL SHARED-MEMORY PARALLEL PROGRAMMING-MODELS
    KEANE, JA
    GRANT, AJ
    XU, MQ
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 1995, 11 (02): : 233 - 243
  • [43] Scheduling user-level threads on distributed shared-memory multiprocessors
    Polychronopoulos, ED
    Papatheodorou, TS
    [J]. EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 358 - 368
  • [44] A DISTRIBUTED SHARED-MEMORY SYSTEM WITH SELF-ADJUSTING COHERENCE SCHEME
    WANG, HH
    CHANG, RC
    [J]. PARALLEL COMPUTING, 1994, 20 (07) : 1007 - 1025
  • [45] Predicting the performance of reconfigurable optical interconnects in distributed shared-memory systems
    Heirman, Wim
    Dambre, Joni
    Artundo, Inigo
    Debaes, Christof
    Thienpont, Hugo
    Stroobandt, Dirk
    Van Campenhout, Jan
    [J]. PHOTONIC NETWORK COMMUNICATIONS, 2008, 15 (01) : 25 - 40
  • [46] Distributed shared-memory for a workstation cluster with a high speed serial interface
    Nakajo, H
    Tanaka, H
    Nakanishi, Y
    Kohata, M
    Kaneda, Y
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 588 - 597
  • [47] Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores
    Song, Fengguang
    Dongarra, Jack
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 333 - 342
  • [48] FINE-GRAIN ACCESS-CONTROL FOR DISTRIBUTED SHARED-MEMORY
    SCHOINAS, I
    FALSAFI, B
    LEBECK, AR
    REINHARDT, SK
    LARUS, JR
    WOOD, DA
    [J]. SIGPLAN NOTICES, 1994, 29 (11): : 297 - 306
  • [50] Predicting the performance of reconfigurable optical interconnects in distributed shared-memory systems
    Wim Heirman
    Joni Dambre
    Iñigo Artundo
    Christof Debaes
    Hugo Thienpont
    Dirk Stroobandt
    Jan Van Campenhout
    [J]. Photonic Network Communications, 2008, 15 : 25 - 40