A Memory Efficient Algorithm for Adaptive Multidimensional Integration with Multiple GPUs

被引：0

作者：

Arumugam, Kamesh ^{[1
,2
]}

Godunov, Alexander ^{[2
,3
]}

Ranjan, Desh ^{[1
,2
]}

Terzic, Balsa ^{[2
,4
]}

Zuhair, Mohammad ^{[1
,2
]}

机构：

[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA

[2] Old Dominion Univ, Ctr Accelerator Sci, Norfolk, VA 23529 USA

[3] Old Dominion Univ, Dept Phys, Norfolk, VA 23529 USA

[4] Ctr Adv Studies Accelerators, Jefferson Lab, Newport News, VA 23606 USA

来源：

2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2013年

关键词：

QUADRATURE;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a memory-efficient algorithm and its implementation for solving multidimensional numerical integration on a cluster of compute nodes with multiple GPU devices per node. The effective use of shared memory is important for improving the performance on GPUs, because of the bandwidth limitation of the global memory. The best known sequential algorithm for multidimensional numerical integration CUHRE uses a large dynamic heap data structure which is accessed frequently. Devising a GPU algorithm that caches a part of this data structure in the shared memory so as to minimizes global memory access is a challenging task. The algorithm presented here addresses this problem. Furthermore we propose a technique to scale this algorithm to multiple GPU devices. The algorithm was implemented on a cluster of Intel (R) Xeon (R) CPU X5650 compute nodes with 4 Tesla M2090 GPU devices per node. We observed a speedup of up to 240 on a single GPU device as compared to a speedup of 70 when memory optimization was not used. On a cluster of 6 nodes (24 GPU devices) we were able to obtain a speedup of up to 3250. All speedups here are with reference to the sequential implementation running on the compute node.

引用

页码：169 / 175

页数：7

共 50 条

[31] GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs
Wang, Yuke
Feng, Boyuan
Li, Gushu
Li, Shuangchen
Deng, Lei
Xie, Yuan
Ding, Yufei
PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), 2021, : 515 - 531
[32] A STOCHASTIC SEARCH ALGORITHM WITH AN APPLICATION TO MULTIDIMENSIONAL INTEGRATION
BOROVKOV, KA
ANNALES ACADEMIAE SCIENTIARUM FENNICAE-MATHEMATICA, 1992, 17 (01): : 5 - 10
[33] PSO Efficient Implementation on GPUs Using Low Latency Memory
Silva, E. H. M.
Bastos Filho, C. J. A.
IEEE LATIN AMERICA TRANSACTIONS, 2015, 13 (05) : 1619 - 1624
[34] An Efficient Parallel Algorithm for Longest Common Subsequence Problem on GPUs
Yang, Jiaoyun
Xu, Yun
Shang, Yi
WORLD CONGRESS ON ENGINEERING, WCE 2010, VOL I, 2010, : 499 - 504
[35] Efficient Parallel Algorithm for Compound Comparisons on Multi-GPUs
Lin, Chun-Yuan
Wang, Chung-Hung
Hung, Che-Lun
Lin, Yu-Shiang
2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
[36] A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPUs
Wang, Yu
Qian, Weikang
Zhang, Shuchang
Liang, Xiaoyao
Yuan, Bo
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (01) : 17 - 30
[37] Efficient-Low Memory Path Planning Algorithm Based on Adaptive Thresholding
Gai, Rongli
Wang, Xiaohong
Wang, Kang
IEEE ACCESS, 2023, 11 : 81378 - 81388
[38] Parallel Algorithm Mapping to Memory Multidimensional Signals
Balasa, Florin
Luican, Ilie I.
Zhu, Hongwei
2016 IEEE 7TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS & SYSTEMS (LASCAS), 2016, : 295 - 298
[39] Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs
Mungiello, Innocenzo
De Rosa, Francesco
ADVANCES ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING, 2017, 1 : 361 - 372
[40] DCUHRE - AN ADAPTIVE MULTIDIMENSIONAL INTEGRATION ROUTINE FOR A VECTOR OF INTEGRALS
BERNTSEN, J
ESPELID, TO
GENZ, A
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1991, 17 (04): : 452 - 456

← 1 2 3 4 5 →