Optimizing Parallel Sn Sweeps on Unstructured Grids for Multi-Core Clusters

被引：2

作者：

闫洁 ^{[1
,2
]}

谭光明 ^{[1
]}

孙凝晖 ^{[1
]}

机构：

[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences

[2] University of Chinese Academy of Sciences

来源：

Journal of Computer Science & Technology | 2013年 / 28卷 / 04期

基金：

中国国家自然科学基金;

关键词：

parallel Sn sweep; unstructured grid; data-driven algorithm;

D O I：

暂无

中图分类号：

TP393.01 [];

学科分类号：

081201 ; 1201 ;

摘要：

In particle transport simulations, radiation effects are often described by the discrete ordinates (Sn) form of Boltzmann equation. In each ordinate direction, the solution is computed by sweeping the radiation flux across the grid. Parallel Sn sweep on an unstructured grid can be explicitly modeled as topological traversal through an equivalent directed acyclic graph (DAG), which is a data-driven algorithm. Its traditional design using MPI model results in irregular communication of massive short messages which cannot be effciently handled by MPI runtime. Meanwhile, in high-end HPC cluster systems, multicore has become the standard processor configuration of a single node. The traditional data-driven algorithm of Sn sweeps has not exploited potential advantages of multi-threading of multicore on shared memory. These advantages, however, as we shall demonstrate, could provide an elegant solution resolving problems in the previous MPI-only design. In this paper, we give a new design of data-driven parallel Sn sweeps using hybrid MPI and Pthread programming, namely Sweep-H, to exploit hierarchical parallelism of processes and threads. With special multi-threading techniques and vertex schedule policy, Sweep-H gets more effcient communication and better load balance. We further present an analytical performance model for Sweep-H to reveal why and when it is advantageous over former MPI counterpart. On a 64-node multicore cluster system with 12 cores per node, 768 cores in total, Sweep-H achieves nearly linear scalability for moderate problem sizes, and better absolute performance than the previous MPI algorithm on more than 16 nodes (by up to two times speedup on 64 nodes).

引用

页码：657 / 670

页数：14

共 50 条

[1] Optimizing Parallel Sn Sweeps on Unstructured Grids for Multi-Core Clusters
Jie Yan
Guang-Ming Tan
Ning-Hui Sun
[J]. Journal of Computer Science and Technology, 2013, 28 : 657 - 670
[2] Optimizing Parallel S n Sweeps on Unstructured Grids for Multi-Core Clusters
Yan, Jie
Tan, Guang-Ming
Sun, Ning-Hui
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2013, 28 (04) : 657 - 670
[3] Parallel Sn sweeps on unstructured grids:: Algorithms for prioritization, grid partitioning, and cycle detection
Plimpton, SJ
Hendrickson, B
Burns, SP
McLendon, W
Rauchwerger, L
[J]. NUCLEAR SCIENCE AND ENGINEERING, 2005, 150 (03) : 267 - 283
[4] An algorithm for parallel Sn sweeps on unstructured meshes
Pautz, SD
[J]. NUCLEAR SCIENCE AND ENGINEERING, 2002, 140 (02) : 111 - 136
[5] Parallel algorithms for Sn transport sweeps on unstructured meshes
Colomer, G.
Borrell, R.
Trias, F. X.
Rodriguez, I.
[J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2013, 232 (01) : 118 - 135
[6] Optimizing the parallel adaptive indexing algorithm on multi-core CPUs
[J]. 1600, Science Press (43):
[7] Performance Pattern of Unified Parallel C on Multi-Core Clusters
Hamid, Nor Asilah Wati Abdul
Serres, Olivier
Anbar, Ahmad
Hassan, Sazlinah
[J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1751 - 1757
[8] Parallel Algorithm Study of Petri net Based on Multi-core Clusters
Li, Wenjing
Lin, Zhong-ming
Pan, Ying
Tang, Ze-yu
[J]. 14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 54 - 57
[9] Optimizing image processing on multi-core CPUs with Intel parallel programming technologies
Kim, Cheong Ghil
Kim, Jeom Goo
Lee, Do Hyeon
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 68 (02) : 237 - 251
[10] Optimizing parallel matrix transpose algorithm on multi-core digital signal processors
Pei, Xiangdong
Wang, Qinglin
Liao, Linyu
Li, Rongchun
Mei, Songzhu
Liu, Jie
Pang, Zhengbin
[J]. Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2023, 45 (01): : 57 - 66

← 1 2 3 4 5 →