Massively parallel lattice-Boltzmann codes on large GPU clusters

被引:48
|
作者
Calore, E. [1 ,2 ]
Gabbana, A. [1 ]
Kraus, J. [3 ]
Pellegrini, E. [1 ]
Schifano, S. F. [1 ,2 ]
Tripiccione, R. [1 ,2 ]
机构
[1] Univ Ferrara, Via Saragat 1, I-44122 Ferrara, Italy
[2] INFN Ferrara, Via Saragat 1, I-44122 Ferrara, Italy
[3] NVIDIA GmbH, Adenauerstr 20 A4, D-52146 Wurselen, Germany
关键词
Lattice-Boltzmann; GPU accelerators; Massively parallel programming; Heterogeneous systems; PERFORMANCE; PORTABILITY;
D O I
10.1016/j.parco.2016.08.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a massively parallel code for a state -of-the art thermal lattice-Boltzmann method. Our code has been carefully optimized for performance on one GPU and to have a good scaling behavior extending to a large number of GPUs. Versions of this code have been already used for large-scale studies of convective turbulence. GPUs are becoming increasingly popular in HPC applications, as they are able to deliver higher performance than traditional processors. Writing efficient programs for large clusters is not an easy task as codes must adapt to increasingly parallel architectures, and the overheads of node-to-node communications must be properly handled. We describe the structure of our code, discussing several key design choices that were guided by theoretical models of performance and experimental benchmarks. We present an extensive set of performance measurements and identify the corresponding main bottlenecks; finally we compare the results of our GPU code with those measured on other currently available high performance processors. Our results are a production-grade code able to deliver a sustained performance of several tens of Tflops as well as a design and optimization methodology that can be used for the development of other high performance applications for computational physics. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 50 条
  • [21] Parallel lattice-Boltzmann simulation of fluid flow in centrifugal elutriation chambers
    Kandhai, D
    Dubbeldam, D
    Hoekstra, AG
    Sloot, PMA
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 173 - 182
  • [22] Lattice Boltzmann for Large-Scale GPU Systems
    Gray, Alan
    Hart, Alistair
    Richardson, Alan
    Stratford, Kevin
    APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 167 - 174
  • [23] Cache performance optimizations for parallel lattice Boltzmann codes
    Wilke, J
    Pohl, T
    Kowarschik, M
    Rüde, U
    EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 441 - 450
  • [24] Lattice-Boltzmann Water Waves
    Geist, Robert
    Corsi, Christopher
    Tessendorf, Jerry
    Westall, James
    ADVANCES IN VISUAL COMPUTING, PT I, 2010, 6453 : 74 - 85
  • [25] Accuracy of the lattice-Boltzmann method
    Maier, RS
    Bernard, RS
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 1997, 8 (04): : 747 - 752
  • [26] Heterogeneous CPU plus GPU approaches for mesh refinement over Lattice-Boltzmann simulations
    Valero-Lara, Pedro
    Jansson, Johan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (07):
  • [27] PARALLEL ACTIVE CONTOUR WITH LATTICE BOLTZMANN SCHEME ON MODERN GPU
    Sun, Xiuyu
    Wang, Zhiqiang
    Chen, George
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1709 - 1712
  • [28] putation On Large GPU Clusters for Lattice QCD
    Shi, Guochun
    Babich, Ronald
    Clark, Michael A.
    Joo, Balint
    Gottlieb, Steven
    Kindratenko, Volodymyr
    2012 SYMPOSIUM ON APPLICATION ACCELERATORS IN HIGH PERFORMANCE COMPUTING (SAAHPC), 2012, : 1 - 10
  • [29] Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units
    Xiong QinGang
    Li Bo
    Xu Ji
    Fang XiaoJian
    Wang XiaoWei
    Wang LiMin
    He XianFeng
    Ge Wei
    CHINESE SCIENCE BULLETIN, 2012, 57 (07): : 707 - 715
  • [30] Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units
    XIONG QinGang1
    2 Graduate University of Chinese Academy of Sciences
    Science Bulletin, 2012, (07) : 707 - 715