Scalable NUMA-Aware Wilson-Dirac on Supercomputers

被引:2
|
作者
Tadonki, Claude [1 ]
机构
[1] PSL Res Univ, Mines ParisTech, CRI, 35 Rue St Honore, F-77305 Fontainebleau, France
关键词
D O I
10.1109/HPCS.2017.56
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We revisit the Wilson-Dirac operator, also referred as Dslash, on NUMA manycore vector machines and thereby seek an efficient supercomputing implementation. Quantum ChromoDynamics (QCD) is the theory of the strong nuclear force and its discrete formalism is the so-called Lattice Quantum ChromoDynamics (LQCD). Wilson-Dirac is the major computing kernel in LQCD, where a special attention is paid to large scale simulations. The corresponding computing demand is tremendous at various levels from storage to floating-point operations, thus the crucial need for powerful supercomputers. Designing efficient LQCD codes on modern (mostly hybrid) supercomputers requires to efficiently exploit all available levels of parallelism including accelerators. Since Wilson-Dirac is a coarse-grain stencil computation performed on a huge volume of data, any performance and scalability related investigation should skillfully address memory accesses and interprocessor communication overheads. In order to lower the latter, explicit shared memory implementations should be considered at the level of a compute node, since this will lead to a less complex data communication graph and thus (at least intuitively) reduce the overall communication latency. We focus on this aspect and propose a novel efficient NUMA-aware scheduling, together with a combination of the major HPC strategies for large-scale LQCD. We reach nearly optimal performances on a single core and a significant scalability improvement on several NUMA nodes. Then, using a classical domain decomposition approach, we extend our scheduling to a large cluster of many-core nodes, thus illustrating the global efficiency of our hybrid implementation.
引用
收藏
页码:315 / 324
页数:10
相关论文
共 50 条
  • [1] Scalable Adaptive NUMA-Aware Lock
    Zhang, Mingzhe
    Chen, Haibo
    Cheng, Luwei
    Lau, Francis C. M.
    Wang, Cho-Li
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1754 - 1769
  • [2] Scalable NUMA-aware Blocking Synchronization Primitives
    Kashyap, Sanidhya
    Mm, Changwoo
    Kim, Taesoo
    [J]. 2017 USENIX ANNUAL TECHNICAL CONFERENCE (USENIX ATC '17), 2017, : 603 - 615
  • [3] NEMO: NUMA-aware Concurrency Control for Scalable Transactional Memory
    Mohamedin, Mohamed
    Peluso, Sebastiano
    Kishi, Masoomeh Javidi
    Hassan, Ahmed
    Palmieri, Roberto
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [4] NUMA-aware Scalable Graph Traversal on SGI UV Systems
    Yasui, Yuichiro
    Fujisawa, Katsuki
    Goh, Eng Lim
    Baron, John
    Sugiura, Atsushi
    Uchiyama, Takashi
    [J]. PROCEEDINGS OF THE ACM WORKSHOP ON HIGH PERFORMANCE GRAPH PROCESSING (HPGP'16), 2016, : 19 - 26
  • [5] On Designing NUMA-Aware Concurrency Control for Scalable Transactional Memory
    Mohamedin, Mohamed
    Palmieri, Roberto
    Peluso, Sebastiano
    Ravindran, Binoy
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (08) : 393 - 394
  • [6] NUMA-Aware Scalable and Efficient In-Memory Aggregation on Large Domains
    Wang, Li
    Zhou, Minqi
    Zhang, Zhenjie
    Shan, Ming-Chien
    Zhou, Aoying
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (04) : 1071 - 1084
  • [7] Compact NUMA-aware Locks
    Dice, Dave
    Kogan, Alex
    [J]. PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,
  • [8] NUMA-Aware Task Performance Analysis
    Schmidl, Dirk
    Mueller, Matthias S.
    [J]. OpenMP: Memory, Devices, and Tasks, 2016, 9903 : 77 - 88
  • [9] A NUMA-Aware Recoverable Mutex Lock
    Fahmy, Ahmed
    Golab, Wojciech
    [J]. PROCEEDINGS OF THE 34TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, SPAA 2022, 2022, : 295 - 305
  • [10] A NUMA-Aware Recoverable Mutex Lock
    Fahmy, Ahmed
    Golab, Wojciech
    [J]. Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2022, : 295 - 305