Optimizing Parallel Sn Sweeps on Unstructured Grids for Multi-Core Clusters

被引:2
|
作者
闫洁 [1 ,2 ]
谭光明 [1 ]
孙凝晖 [1 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences
[2] University of Chinese Academy of Sciences
基金
中国国家自然科学基金;
关键词
parallel Sn sweep; unstructured grid; data-driven algorithm;
D O I
暂无
中图分类号
TP393.01 [];
学科分类号
081201 ; 1201 ;
摘要
In particle transport simulations, radiation effects are often described by the discrete ordinates (Sn) form of Boltzmann equation. In each ordinate direction, the solution is computed by sweeping the radiation flux across the grid. Parallel Sn sweep on an unstructured grid can be explicitly modeled as topological traversal through an equivalent directed acyclic graph (DAG), which is a data-driven algorithm. Its traditional design using MPI model results in irregular communication of massive short messages which cannot be effciently handled by MPI runtime. Meanwhile, in high-end HPC cluster systems, multicore has become the standard processor configuration of a single node. The traditional data-driven algorithm of Sn sweeps has not exploited potential advantages of multi-threading of multicore on shared memory. These advantages, however, as we shall demonstrate, could provide an elegant solution resolving problems in the previous MPI-only design. In this paper, we give a new design of data-driven parallel Sn sweeps using hybrid MPI and Pthread programming, namely Sweep-H, to exploit hierarchical parallelism of processes and threads. With special multi-threading techniques and vertex schedule policy, Sweep-H gets more effcient communication and better load balance. We further present an analytical performance model for Sweep-H to reveal why and when it is advantageous over former MPI counterpart. On a 64-node multicore cluster system with 12 cores per node, 768 cores in total, Sweep-H achieves nearly linear scalability for moderate problem sizes, and better absolute performance than the previous MPI algorithm on more than 16 nodes (by up to two times speedup on 64 nodes).
引用
收藏
页码:657 / 670
页数:14
相关论文
共 50 条
  • [1] Optimizing Parallel Sn Sweeps on Unstructured Grids for Multi-Core Clusters
    Jie Yan
    Guang-Ming Tan
    Ning-Hui Sun
    [J]. Journal of Computer Science and Technology, 2013, 28 : 657 - 670
  • [2] Optimizing Parallel S n Sweeps on Unstructured Grids for Multi-Core Clusters
    Yan, Jie
    Tan, Guang-Ming
    Sun, Ning-Hui
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2013, 28 (04) : 657 - 670
  • [3] Parallel Sn sweeps on unstructured grids:: Algorithms for prioritization, grid partitioning, and cycle detection
    Plimpton, SJ
    Hendrickson, B
    Burns, SP
    McLendon, W
    Rauchwerger, L
    [J]. NUCLEAR SCIENCE AND ENGINEERING, 2005, 150 (03) : 267 - 283
  • [4] An algorithm for parallel Sn sweeps on unstructured meshes
    Pautz, SD
    [J]. NUCLEAR SCIENCE AND ENGINEERING, 2002, 140 (02) : 111 - 136
  • [5] Parallel algorithms for Sn transport sweeps on unstructured meshes
    Colomer, G.
    Borrell, R.
    Trias, F. X.
    Rodriguez, I.
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2013, 232 (01) : 118 - 135
  • [7] Performance Pattern of Unified Parallel C on Multi-Core Clusters
    Hamid, Nor Asilah Wati Abdul
    Serres, Olivier
    Anbar, Ahmad
    Hassan, Sazlinah
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1751 - 1757
  • [8] Parallel Algorithm Study of Petri net Based on Multi-core Clusters
    Li, Wenjing
    Lin, Zhong-ming
    Pan, Ying
    Tang, Ze-yu
    [J]. 14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 54 - 57
  • [9] Optimizing image processing on multi-core CPUs with Intel parallel programming technologies
    Kim, Cheong Ghil
    Kim, Jeom Goo
    Lee, Do Hyeon
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 68 (02) : 237 - 251
  • [10] Optimizing parallel matrix transpose algorithm on multi-core digital signal processors
    Pei, Xiangdong
    Wang, Qinglin
    Liao, Linyu
    Li, Rongchun
    Mei, Songzhu
    Liu, Jie
    Pang, Zhengbin
    [J]. Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2023, 45 (01): : 57 - 66