Performance issues on many-core processors: A D2Q37 Lattice Boltzmann scheme as a test-case

被引:16
|
作者
Mantovani, F. [1 ]
Pivanti, M. [2 ]
Schifano, S. F. [3 ,4 ]
Tripiccione, R. [4 ,5 ,6 ]
机构
[1] Univ Regensburg, Fac Phys, D-93053 Regensburg, Germany
[2] Univ Roma La Sapienza, Dipartimento Fis, Rome, Italy
[3] Univ Ferrara, Dipartimento Matemat & Informat, I-44100 Ferrara, Italy
[4] Ist Nazl Fis Nucl, Milan, Italy
[5] Univ Ferrara, Dipartimento Fis, I-44100 Ferrara, Italy
[6] Univ Ferrara, CMCS, I-44100 Ferrara, Italy
关键词
Computational fluid-dynamics; Lattice Boltzmann methods; Many-core architectures; Performance analysis;
D O I
10.1016/j.compfluid.2013.05.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Performances on recent processor architectures heavily rely on the ability of applications and compilers to exploit a more and more diverse and large set of parallel features. In this paper we focus on issues related to the efficient programming of multi-core processors based on the Sandy Bridge micro-architecture recently introduced by Intel. As a test-case application we use a D2Q37 Lattice Boltzmann algorithm, which accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equations of state of a perfect gas. The regular structure and the high degree of parallelism available in this class of applications make it relatively easy to exploit several processor features relevant for performance, such as, for example, the new Advanced Vector Extension (AVX) SIMD instructions set. However the main challenge is how to efficiently map the application onto the hardware structure of the processor. In this paper we present the implementation of our Lattice Boltzmann code on the Sandy Bridge processor, and assess the efficiency of several programming strategies and data-structure organizations, both in terms of memory access and computing performance. We also compare our results with that obtained on previous generation Intel processors, and with recent NVIDIA GP-GPU computing systems. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:743 / 752
页数:10
相关论文
共 15 条
  • [1] An optimized D2Q37 Lattice Boltzmann code on GP-GPUs
    Schifano, S.F. (schifano@fe.infn.it), 2013, Elsevier Ltd (80):
  • [2] A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code
    Biferale, Luca
    Mantovani, Filippo
    Pivanti, Marcello
    Pozzati, Fabio
    Sbragaglia, Mauro
    Scagliarini, Andrea
    Schifano, Sebastiano Fabio
    Toschi, Federico
    Tripiccione, Raffaele
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2012, 7203 : 640 - 650
  • [3] An optimized D2Q37 Lattice Boltzmann code on GP-GPUs
    Biferale, Luca
    Mantovani, Filippo
    Pivanti, Marcello
    Pozzati, Fabio
    Sbragaglia, Mauro
    Scagliarini, Andrea
    Schifano, Sebastiano Fabio
    Toschi, Federico
    Tripiccione, Raffaele
    COMPUTERS & FLUIDS, 2013, 80 : 55 - 62
  • [4] Exploiting parallelism in many-core architectures: Lattice Boltzmann models as a test case
    Mantovani, F.
    Pivanti, M.
    Schifano, S. F.
    Tripiccione, R.
    24TH IUPAP CONFERENCE ON COMPUTATIONAL PHYSICS (IUPAP-CCP 2012), 2013, 454
  • [5] Improving 3D Lattice Boltzmann Method stencil with asynchronous transfers on many-core processors
    Minh Quan Ho
    Obrecht, Christian
    Tourancheau, Bernard
    de Dinechin, Benoit Dupont
    Hascoet, Julien
    2017 IEEE 36TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2017,
  • [6] Performance Optimization of Lattice Post-Quantum Cryptographic Algorithms on Many-Core Processors
    Koteshwara, Sandhya
    Kumar, Manoj
    Pattnaik, Pratap
    2020 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2020, : 223 - 225
  • [7] Performance Evaluation and Tuning of 2D Jacobi Iteration on Many-core Machines
    Hou, Zhengxiong
    Perez, Christian
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 603 - 610
  • [8] Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulations
    Li, Dali
    Xu, Chuanfu
    Cheng, Bin
    Xiong, Min
    Gao, Xiang
    Deng, Xiaogang
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (06): : 2506 - 2524
  • [9] Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulations
    Dali Li
    Chuanfu Xu
    Bin Cheng
    Min Xiong
    Xiang Gao
    Xiaogang Deng
    The Journal of Supercomputing, 2017, 73 : 2506 - 2524
  • [10] A comparative study between D2Q9 and D2Q5 lattice Boltzmann scheme for mass transport phenomena in porous media
    Espinoza-Andaluz, Mayken
    Moyon, Ayrton
    Andersson, Martin
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2019, 78 (09) : 2886 - 2896