Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method

被引:82
|
作者
Januszewski, M. [1 ,2 ]
Kostur, M. [1 ]
机构
[1] Univ Silesia, Inst Phys, PL-40007 Katowice, Poland
[2] Google Switzerland GmbH, CH-8002 Zurich, Switzerland
关键词
Lattice Boltzmann; LBM; Computational fluid dynamics; Graphics processing unit; GPU; CUDA; BOUNDARY-CONDITIONS; BINARY-FLUID; SIMULATION; VISCOSITIES; FLOWS;
D O I
10.1016/j.cpc.2014.04.018
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present Sailfish, an open source fluid simulation package implementing the lattice Boltzmann method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL. We take a novel approach to GPU code implementation and use run-time code generation techniques and a high level programming language (Python) to achieve state of the art performance, while allowing easy experimentation with different LBM models and tuning for various types of hardware. We discuss the general design principles of the code, scaling to multiple GPUs in a distributed environment, as well as the GPU implementation and optimization of many different LBM models, both single component (BGK, MRT, ELBM) and multicomponent (Shan-Chen, free energy). The paper also presents results of performance benchmarks spanning the last three NVIDIA GPU generations (Tesla, Fermi, Kepler), which we hope will be useful for researchers working with this type of hardware and similar codes. Program Summary Program title: Sailfish Catalogue identifier: AETA_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AETA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License, version 3 No. of lines in distributed program, including test data, etc.: 225864 No. of bytes in distributed program, including test data, etc.: 46861049 Distribution format: tar.gz Programming language: Python, CUDA C, OpenCL. Computer: Any with an OpenCL or CUDA-compliant GPU. Operating system: No limits (tested on Linux and Mac OS X). RAM: Hundreds of megabytes to tens of gigabytes for typical cases. \ Classification: 12, 6.5. External routines: PyCUDA/PyOpenCL, Numpy, Mako, ZeroMQ (for multi-GPU simulations), scipy, sympy Nature of problem: GPU-accelerated simulation of single- and multi-component fluid flows. Solution method: A wide range of relaxation models (LBGK, MRT, regularized LB, ELBM, Shan-Chen, free energy, free surface) and boundary conditions within the lattice Boltzmann method framework. Simulations can be run in single or double precision using one or more GPUs. Restrictions: The lattice Boltzmann method works for low Mach number flows only. Unusual features: The actual numerical calculations run exclusively on GPUs. The numerical code is built dynamically at run-time in CUDA C or OpenCL, using templates and symbolic formulas. The high-level control of the simulation is maintained by a Python process. Additional comments: !!!!!The distribution file for this program is over 45 Mbytes and therefore is not delivered directly when Download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. !!!!! Running time: Problem-dependent, typically minutes (for small cases or short simulations) to hours (large cases or long simulations). (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:2350 / 2368
页数:19
相关论文
共 50 条
  • [41] Lattice Boltzmann Simulations at Petascale on Multi-GPU Systems with Asynchronous Data Transfer and Strictly Enforced Memory Read Alignment
    Robertsen, Fredrik
    Westerholm, Jan
    Mattila, Keijo
    [J]. 23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 604 - 609
  • [42] Multi-GPU implementation of a VMAT treatment plan optimization algorithm
    Tian, Zhen
    Peng, Fei
    Folkerts, Michael
    Tan, Jun
    Jia, Xun
    Jiang, Steve B.
    [J]. MEDICAL PHYSICS, 2015, 42 (06) : 2841 - 2852
  • [43] Efficient implementation of data flow graphs on multi-gpu clusters
    Vincent Boulos
    Sylvain Huet
    Vincent Fristot
    Luc Salvo
    Dominique Houzet
    [J]. Journal of Real-Time Image Processing, 2014, 9 : 217 - 232
  • [44] Efficient implementation of data flow graphs on multi-gpu clusters
    Boulos, Vincent
    Huet, Sylvain
    Fristot, Vincent
    Salvo, Luc
    Houzet, Dominique
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2014, 9 (01) : 217 - 232
  • [45] An implementation of the Social Distances Model using multi-GPU systems
    Klusek, Adrian
    Topa, Pawel
    Was, Jaroslaw
    Lubas, Robert
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2018, 32 (04): : 482 - 495
  • [46] Multi-GPU immersed boundary method hemodynamics simulations
    Ames, Jeff
    Puleri, Daniel F.
    Balogh, Peter
    Gounley, John
    Draeger, Erik W.
    Randles, Amanda
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2020, 44
  • [47] A MULTI-GPU SOURCES RECONSTRUCTION METHOD FOR IMAGING APPLICATIONS
    Lopez-Portugues, Miguel
    Alvarez, Yuri
    Lopez-Fernandez, Jesus A.
    Garcia, Cebrian
    Ayestaran, Rafael G.
    Las-Heras, Fernando
    [J]. PROGRESS IN ELECTROMAGNETICS RESEARCH-PIER, 2013, 136 : 703 - 724
  • [48] A Multi-GPU Approach For The Exchange Monte Carlo Method
    Navarro, Cristobal A.
    Wei, Huang
    Deng, Youjin
    [J]. 2015 34TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2015,
  • [49] COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD
    McClure, James E.
    Prins, Jan F.
    Miller, Cass T.
    [J]. PROCEEDINGS OF THE XVIII INTERNATIONAL CONFERENCE ON COMPUTATIONAL METHODS IN WATER RESOURCES (CMWR 2010), 2010, : 1027 - 1034
  • [50] GPU Based Parallel Computing of Lattice Boltzmann Method
    Zhang, Ruoxing
    Chou, Qiang
    Wang, Haidan
    Ge, Daochuan
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 43 - 49