Symmetric Indefinite Linear Solver Using OpenMP Task on Multicore Architectures

被引：7

作者：

Yamazaki, Ichitaro ^{[1
]}

Kurzak, Jakub ^{[1
]}

Wu, Panruo ^{[1
]}

Zounon, Mawussi ^{[2
]}

Dongarra, Jack ^{[2
]}

机构：

[1] Univ Tennessee, Elect Engn & Comp Sci, Knoxville, TN 37996 USA

[2] Univ Manchester, Sch Math, Manchester M13 9PL, Lancs, England

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2018年 / 29卷 / 08期

关键词：

Linear algebra; symmetric indefinite matrices; multithreading; Runtime;

D O I：

10.1109/TPDS.2018.2808964

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recently, the Open Multi-Processing (OpenMP) standard has incorporated task-based programming, where a function call with input and output data is treated as a task. At run time, OpenMP's superscalar scheduler tracks the data dependencies among the tasks and executes the tasks as their dependencies are resolved. On a shared-memory architecture with multiple cores, the independent tasks are executed on different cores in parallel, thereby enabling parallel execution of a seemingly sequential code. With the emergence of many-core architectures, this type of programming paradigm is gaining attention-not only because of its simplicity, but also because it breaks the artificial synchronization points of the program and improves its thread-level parallelization. In this paper, we use these new OpenMP features to develop a portable high-performance implementation of a dense symmetric indefinite linear solver. Obtaining high performance from this kind of solver is a challenge because the symmetric pivoting, which is required to maintain numerical stability, leads to data dependencies that prevent us from using some common performance-improving techniques. To fully utilize a large number of cores through tasking, while conforming to the OpenMP standard, we describe several techniques. Our performance results on current many-core architectures-including Intel's Broadwell, Intel's Knights Landing, IBM's Power8, and Arm's ARMv8-demonstrate the portable and superior performance of our implementation compared with the Linear Algebra PACKage (LAPACK). The resulting solver is now available as a part of the PLASMA software package.

引用

页码：1879 / 1892

页数：14

共 50 条

[1] A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures
Baboulin, Marc
Becker, Dulceneia
Dongarra, Jack
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 14 - 24
[2] OpenMP on multicore architectures
Terboven, Christian
Mey, Dieter an
Sarholz, Samuel
PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 54 - 64
[3] A Sparse Symmetric Indefinite Direct Solver for GPU Architectures
Hogg, Jonathan D.
Ovtchinnikov, Evgueni
Scott, Jennifer A.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2016, 42 (01):
[4] Accelerating the Iterative Linear Solver for Reservoir Simulation on Multicore Architectures
Wu, Wei
Li, Xiang
He, Lei
Zhang, Dongxiao
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 265 - 272
[5] Optimisation Techniques for Multicore Architectures and Parallel Processing using OpenMP
Ataullah, Sara Tabassum
Siddique, Mohammed
2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
[6] Task-parallel tiled direct solver for dense symmetric indefinite systems
Shen, Zhongyu
Zhang, Jilin
Suzuki, Tomohiro
PARALLEL COMPUTING, 2022, 111
[7] PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP
Dongarra, Jack
Gates, Mark
Haidar, Azzam
Kurzak, Jakub
Luszczek, Piotr
Wu, Panruo
Yamazaki, Ichitaro
Yarkhan, Asim
Abalenkovs, Maksims
Bagherpour, Negin
Hammarling, Sven
Sistek, Jakub
Stevens, David
Zounon, Mawussi
Relton, Samuel D.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2019, 45 (02):
[8] Task-based multifrontal QR solver for GPU-accelerated multicore architectures
Agullo, Emmanuel
Buttari, Alfredo
Guermouche, Abdou
Lopez, Florent
2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 54 - 63
[9] Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
Mallon, Damian A.
Taboada, Guillermo L.
Teijeiro, Carlos
Tourino, Juan
Fraguela, Basilio B.
Gomez, Andres
Doallo, Ramon
Carlos Mourino, J.
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 174 - +
[10] Scheduling dynamic OpenMP applications over multicore architectures
Broquedis, Francois
Diakhate, Francois
Thibault, Samuel
Aumage, Olivier
Namyst, Raymond
Wacrenier, Pierre-Andre
OPENMP IN A NEW ERA OF PARALLELISM, PROCEEDINGS, 2008, 5004 : 170 - 180

← 1 2 3 4 5 →