A Massively Parallel and Scalable Multi-GPU Material Point Method

被引:33
|
作者
Wang, Xinlei [1 ,2 ]
Qiu, Yuxing [2 ,3 ]
Slattery, Stuart R. [4 ]
Fang, Yu [2 ]
Li, Minchen [2 ]
Zhu, Song-Chun [3 ]
Zhu, Yixin [3 ]
Tang, Min [1 ]
Manocha, Dinesh [5 ]
Jiang, Chenfanfu [2 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Univ Penn, Philadelphia, PA 19104 USA
[3] Univ Calif Los Angeles, Los Angeles, CA USA
[4] Oak Ridge Natl Lab, Oak Ridge, TN USA
[5] Univ Maryland, College Pk, MD 20742 USA
来源
ACM TRANSACTIONS ON GRAPHICS | 2020年 / 39卷 / 04期
基金
国家重点研发计划;
关键词
Numerical methods; parallel computing; GPU; SMOOTHED PARTICLE HYDRODYNAMICS; SIMULATION;
D O I
10.1145/3386569.3392442
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point Method (MPM) for simulating physical behaviors of materials undergoing complex topological changes, self-collision, and large deformations. Our system makes three critical contributions. First, we introduce a new particle data structure that promotes coalesced memory access patterns on the GPU and eliminates the need for complex atomic operations on the memory hierarchy when writing particle data to the grid. Second, we propose a kernel fusion approach using a new Grid-to-Particles-to-Grid (G2P2G) scheme, which efficiently reduces GPU kernel launches, improves latency, and significantly reduces the amount of global memory needed to store particle data. Finally, we introduce optimized algorithmic designs that allow for efficient sparse grids in a shared memory context, enabling us to best utilize modern multi-GPU computational platforms for hybrid Lagrangian-Eulerian computational patterns. We demonstrate the effectiveness of our method with extensive benchmarks, evaluations, and dynamic simulations with elastoplasticity, granular media, and fluid dynamics. In comparisons against an open-source and heavily optimized CPU-based MPM codebase [Fang et al. 2019] on an elastic sphere colliding scene with particle counts ranging from 5 to 40 million, our GPU MPM achieves over 100X per-time-step speedup on a workstation with an Intel 8086K CPU and a single Quadro P6000 GPU, exposing exciting possibilities for future MPM simulations in computer graphics and computational science. Moreover, compared to the state-of-the-art GPU MPM method [Hu et al. 2019a], we not only achieve 2x acceleration on a single GPU but our kernel fusion strategy and Array-of-Structs-of-Array (AoSoA) data structure design also generalizes to multi-GPU systems. Our multi-GPU MPM exhibits near-perfect weak and strong scaling with 4 GPUs, enabling performant and large-scale simulations on a 10243 grid with close to 100 million particles with less than 4 minutes per frame on a single 4-GPU workstation and 134 million particles with less than 1 minute per frame on an 8-GPU workstation.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures
    Paul, Johns
    Lu, Shengliang
    He, Bingsheng
    Lau, Chiew Tong
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1413 - 1425
  • [2] WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes
    Juenger, Daniel
    Hundt, Christian
    Schmidt, Bertil
    [J]. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 441 - 450
  • [3] Distributed Multi-GPU Accelerated Hybrid Parallel Rendering for Massively Parallel Environment
    Cao, Yi
    Wang, Huawei
    Ai, Zhiwei
    [J]. 2014 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV2014), 2014, : 30 - 36
  • [4] Multi-GPU Approach for Development of Parallel and Scalable Pub-Sub System
    Shah, Medha A.
    Kulkarni, Dinesh
    [J]. COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 471 - 478
  • [5] Scalable hybrid implementation of the Schur complement method for multi-GPU systems
    Kopysov, Sergey
    Kuzmin, Igor
    Nedozhogin, Nikita
    Novikov, Alexander
    Sagdeeva, Yulia
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 69 (01): : 81 - 88
  • [6] Scalable hybrid implementation of the Schur complement method for multi-GPU systems
    Sergey Kopysov
    Igor Kuzmin
    Nikita Nedozhogin
    Alexander Novikov
    Yulia Sagdeeva
    [J]. The Journal of Supercomputing, 2014, 69 : 81 - 88
  • [7] Scalable multi-GPU implementation of the MAGFLOW simulator
    Rustico, Eugenio
    Bilotta, Giuseppe
    Herault, Alexis
    Del Negro, Ciro
    Gallo, Giovanni
    [J]. ANNALS OF GEOPHYSICS, 2011, 54 (05) : 592 - 599
  • [8] Scalable Betweenness Centrality on Multi-GPU systems
    Bernaschi, Massimo
    Carbone, Giancarlo
    Vella, Flavio
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 29 - 36
  • [9] Scalable multi-gpu cloud raytracing with OpenGL
    Chochlik, Matus
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON DIGITAL TECHNOLOGIES (DT), 2014, : 87 - 95
  • [10] CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition
    Ren, Xiaowei
    Lis, Mieszko
    [J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 709 - 722