Extending Shared-Memory Computations to Multiple Distributed Nodes

被引：0

作者：

Ahmed, Waseem ^{[1
]}

机构：

[1] King Abdulaziz Univ, Dept Comp Sci, Fac Comp & Informat Technol, Jeddah, Saudi Arabia

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2020年 / 11卷 / 08期

关键词：

GPU; OpenMP; shared memory programming; distributed programming; CUDA; MATRIX MULTIPLICATION; PERFORMANCE; PROGRAMS; OPENMP;

D O I：

10.14569/IJACSA.2020.0110882

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With the emergence of accelerators like GPUs, MICs and FPGAs, the availability of domain specific libraries (like MKL) and the ease of parallelization associated with CUDA and OpenMP based shared-memory programming, node-based parallelization has recently become a popular choice among developers in the field of scientific computing. This is evident from the large volume of recently published work in various domains of scientific computing, where shared-memory programming and accelerators have been used to accelerate applications. Although these approaches are suitable for small problem-sizes, there are issues that need to be addressed for them to be applicable to larger input domains. Firstly, the primary focus of these works has been to accelerate the core kernel; acceleration of input/output operations is seldom considered. Many operations in scientific computing operate on large matrices-both sparse and dense - that are read from and written to external files. These input-output operations present themselves as bottlenecks and significantly effect the overall application time. Secondly, node-based parallelization limits a developer from distributing the computation beyond a single node without him having to learn an additional programming paradigm like MPI. Thirdly, the problem size that can be effectively handled by a node is limited by the memory of the node and accelerator. In this paper, an Asynchronous Multi-node Execution (AMNE) approach is presented that uses a unique combination of the shared-file system and pseudo-replication to extend node-based algorithms to a distributed multiple node implementation with minimal changes to the original node-based code. We demonstrate this approach by applying it to GEMM, a popular kernel in dense linear algebra and show that the presented methodology significantly advances the state of art in the field of parallelization and scientific computing.

引用

页码：675 / 685

页数：11

共 50 条

[41] Cachet: An adaptive cache coherence protocol for distributed shared-memory systems
Shen, Xiaowei
Arvind
Rudolph, Larry
[J]. Proceedings of the International Conference on Supercomputing, 1999, : 135 - 144
[42] COMPARING DISTRIBUTED-MEMORY AND VIRTUAL SHARED-MEMORY PARALLEL PROGRAMMING-MODELS
KEANE, JA
GRANT, AJ
XU, MQ
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 1995, 11 (02): : 233 - 243
[43] Scheduling user-level threads on distributed shared-memory multiprocessors
Polychronopoulos, ED
Papatheodorou, TS
[J]. EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 358 - 368
[44] A DISTRIBUTED SHARED-MEMORY SYSTEM WITH SELF-ADJUSTING COHERENCE SCHEME
WANG, HH
CHANG, RC
[J]. PARALLEL COMPUTING, 1994, 20 (07) : 1007 - 1025
[45] Predicting the performance of reconfigurable optical interconnects in distributed shared-memory systems
Heirman, Wim
Dambre, Joni
Artundo, Inigo
Debaes, Christof
Thienpont, Hugo
Stroobandt, Dirk
Van Campenhout, Jan
[J]. PHOTONIC NETWORK COMMUNICATIONS, 2008, 15 (01) : 25 - 40
[46] Distributed shared-memory for a workstation cluster with a high speed serial interface
Nakajo, H
Tanaka, H
Nakanishi, Y
Kohata, M
Kaneda, Y
[J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 588 - 597
[47] Scaling Up Matrix Computations on Shared-Memory Manycore Systems with 1000 CPU Cores
Song, Fengguang
Dongarra, Jack
[J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 333 - 342
[48] FINE-GRAIN ACCESS-CONTROL FOR DISTRIBUTED SHARED-MEMORY
SCHOINAS, I
FALSAFI, B
LEBECK, AR
REINHARDT, SK
LARUS, JR
WOOD, DA
[J]. SIGPLAN NOTICES, 1994, 29 (11): : 297 - 306
[49] Toward platform node-microprocessors for distributed shared-memory multiprocessing
[J]. (Publ by IOS Press, Amsterdam, Neth):
[50] Predicting the performance of reconfigurable optical interconnects in distributed shared-memory systems
Wim Heirman
Joni Dambre
Iñigo Artundo
Christof Debaes
Hugo Thienpont
Dirk Stroobandt
Jan Van Campenhout
[J]. Photonic Network Communications, 2008, 15 : 25 - 40

← 1 2 3 4 5 →