A Memory Efficient Algorithm for Adaptive Multidimensional Integration with Multiple GPUs

被引:0
|
作者
Arumugam, Kamesh [1 ,2 ]
Godunov, Alexander [2 ,3 ]
Ranjan, Desh [1 ,2 ]
Terzic, Balsa [2 ,4 ]
Zuhair, Mohammad [1 ,2 ]
机构
[1] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[2] Old Dominion Univ, Ctr Accelerator Sci, Norfolk, VA 23529 USA
[3] Old Dominion Univ, Dept Phys, Norfolk, VA 23529 USA
[4] Ctr Adv Studies Accelerators, Jefferson Lab, Newport News, VA 23606 USA
关键词
QUADRATURE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a memory-efficient algorithm and its implementation for solving multidimensional numerical integration on a cluster of compute nodes with multiple GPU devices per node. The effective use of shared memory is important for improving the performance on GPUs, because of the bandwidth limitation of the global memory. The best known sequential algorithm for multidimensional numerical integration CUHRE uses a large dynamic heap data structure which is accessed frequently. Devising a GPU algorithm that caches a part of this data structure in the shared memory so as to minimizes global memory access is a challenging task. The algorithm presented here addresses this problem. Furthermore we propose a technique to scale this algorithm to multiple GPU devices. The algorithm was implemented on a cluster of Intel (R) Xeon (R) CPU X5650 compute nodes with 4 Tesla M2090 GPU devices per node. We observed a speedup of up to 240 on a single GPU device as compared to a speedup of 70 when memory optimization was not used. On a cluster of 6 nodes (24 GPU devices) we were able to obtain a speedup of up to 3250. All speedups here are with reference to the sequential implementation running on the compute node.
引用
收藏
页码:169 / 175
页数:7
相关论文
共 50 条
  • [31] GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs
    Wang, Yuke
    Feng, Boyuan
    Li, Gushu
    Li, Shuangchen
    Deng, Lei
    Xie, Yuan
    Ding, Yufei
    PROCEEDINGS OF THE 15TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '21), 2021, : 515 - 531
  • [32] A STOCHASTIC SEARCH ALGORITHM WITH AN APPLICATION TO MULTIDIMENSIONAL INTEGRATION
    BOROVKOV, KA
    ANNALES ACADEMIAE SCIENTIARUM FENNICAE-MATHEMATICA, 1992, 17 (01): : 5 - 10
  • [33] PSO Efficient Implementation on GPUs Using Low Latency Memory
    Silva, E. H. M.
    Bastos Filho, C. J. A.
    IEEE LATIN AMERICA TRANSACTIONS, 2015, 13 (05) : 1619 - 1624
  • [34] An Efficient Parallel Algorithm for Longest Common Subsequence Problem on GPUs
    Yang, Jiaoyun
    Xu, Yun
    Shang, Yi
    WORLD CONGRESS ON ENGINEERING, WCE 2010, VOL I, 2010, : 499 - 504
  • [35] Efficient Parallel Algorithm for Compound Comparisons on Multi-GPUs
    Lin, Chun-Yuan
    Wang, Chung-Hung
    Hung, Che-Lun
    Lin, Yu-Shiang
    2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2014,
  • [36] A Learning Algorithm for Bayesian Networks and Its Efficient Implementation on GPUs
    Wang, Yu
    Qian, Weikang
    Zhang, Shuchang
    Liang, Xiaoyao
    Yuan, Bo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (01) : 17 - 30
  • [37] Efficient-Low Memory Path Planning Algorithm Based on Adaptive Thresholding
    Gai, Rongli
    Wang, Xiaohong
    Wang, Kang
    IEEE ACCESS, 2023, 11 : 81378 - 81388
  • [38] Parallel Algorithm Mapping to Memory Multidimensional Signals
    Balasa, Florin
    Luican, Ilie I.
    Zhu, Hongwei
    2016 IEEE 7TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS & SYSTEMS (LASCAS), 2016, : 295 - 298
  • [39] Adaptive Modular Mapping to Reduce Shared Memory Bank Conflicts on GPUs
    Mungiello, Innocenzo
    De Rosa, Francesco
    ADVANCES ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING, 2017, 1 : 361 - 372
  • [40] DCUHRE - AN ADAPTIVE MULTIDIMENSIONAL INTEGRATION ROUTINE FOR A VECTOR OF INTEGRALS
    BERNTSEN, J
    ESPELID, TO
    GENZ, A
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1991, 17 (04): : 452 - 456