NUMA-Aware Multicore Matrix Multiplication

被引：1

作者：

Alkowaileet, Wail Y. ^{[1
]}

Carrillo-Cisneros, David ^{[1
]}

Lim, Robert V. ^{[1
]}

Scherson, Isaac D. ^{[1
]}

机构：

[1] Univ Calif Irvine, Dept Comp Sci Syst, Irvine, CA 92697 USA

来源：

PARALLEL PROCESSING LETTERS | 2014年 / 24卷 / 04期

关键词：

ccNUMA; matrix multiplication; multicore; multi-socket;

D O I：

10.1142/S0129626414500066

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

A user-level scheduling along with a specific data alignment for matrix multiplication in cache-coherent Non-Uniform Memory Access (ccNUMA) architectures is presented. Addressing the data locality problem that could occur in such systems potentially alleviates memory bottlenecks. We show experimentally that an agnostic thread scheduler (e.g., OpenMP 3.1) from the data placement on a ccNUMA machine produces a high number of cache-misses. To overcome this memory contention problem, we show how proper memory mapping and scheduling manage to tune an existing matrix multiplication implementation and reduce the number of cache-misses by 67% and consequently, reduce the computation time by up to 22%. Finally, we show a relationship between cache-misses and the gained speedup as a novel figure of merit to measure the quality of the method.

引用

页数：12

共 50 条

[1] A Case for NUMA-Aware Contention Management on Multicore Systems
Blagodurov, Sergey
Zhuravlev, Sergey
Fedorova, Alexandra
Kamali, Ali
[J]. PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, : 557 - 558
[2] NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors
Catalan, Sandra
Igual, Francisco D.
Rodriguez-Sanchez, Rafael
Herrero, Jose R.
Quintana-Orti, Enrique S.
[J]. 2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2022), 2022, : 91 - 99
[3] NUMA-aware memory coloring for multicore real-time systems
Pan, Xing
Mueller, Frank
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 118
[4] Compact NUMA-aware Locks
Dice, Dave
Kogan, Alex
[J]. PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,
[5] NUMA-Aware Task Performance Analysis
Schmidl, Dirk
Mueller, Matthias S.
[J]. OpenMP: Memory, Devices, and Tasks, 2016, 9903 : 77 - 88
[6] Scalable Adaptive NUMA-Aware Lock
Zhang, Mingzhe
Chen, Haibo
Cheng, Luwei
Lau, Francis C. M.
Wang, Cho-Li
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1754 - 1769
[7] A NUMA-Aware Recoverable Mutex Lock
Fahmy, Ahmed
Golab, Wojciech
[J]. PROCEEDINGS OF THE 34TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, SPAA 2022, 2022, : 295 - 305
[8] A NUMA-Aware Recoverable Mutex Lock
Fahmy, Ahmed
Golab, Wojciech
[J]. Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2022, : 295 - 305
[9] Beyond the Socket: NUMA-Aware GPUs
Milic, Ugljesa
Villa, Oreste
Bolotin, Evgeny
Arunkumar, Akhil
Ebrahimi, Eiman
Jaleel, Aamer
Ramirez, Alex
Nellans, David
[J]. 50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 123 - 135
[10] Massively Parallel NUMA-Aware Hash Joins
Lang, Harald
Leis, Viktor
Albutiu, Martina-Cezara
Neumann, Thomas
Kemper, Alfons
[J]. IN MEMORY DATA MANAGEMENT AND ANALYSIS, 2015, 8921 : 3 - 14

← 1 2 3 4 5 →