NUMA-Aware Multicore Matrix Multiplication

被引:1
|
作者
Alkowaileet, Wail Y. [1 ]
Carrillo-Cisneros, David [1 ]
Lim, Robert V. [1 ]
Scherson, Isaac D. [1 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci Syst, Irvine, CA 92697 USA
关键词
ccNUMA; matrix multiplication; multicore; multi-socket;
D O I
10.1142/S0129626414500066
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A user-level scheduling along with a specific data alignment for matrix multiplication in cache-coherent Non-Uniform Memory Access (ccNUMA) architectures is presented. Addressing the data locality problem that could occur in such systems potentially alleviates memory bottlenecks. We show experimentally that an agnostic thread scheduler (e.g., OpenMP 3.1) from the data placement on a ccNUMA machine produces a high number of cache-misses. To overcome this memory contention problem, we show how proper memory mapping and scheduling manage to tune an existing matrix multiplication implementation and reduce the number of cache-misses by 67% and consequently, reduce the computation time by up to 22%. Finally, we show a relationship between cache-misses and the gained speedup as a novel figure of merit to measure the quality of the method.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A Case for NUMA-Aware Contention Management on Multicore Systems
    Blagodurov, Sergey
    Zhuravlev, Sergey
    Fedorova, Alexandra
    Kamali, Ali
    [J]. PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, : 557 - 558
  • [2] NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors
    Catalan, Sandra
    Igual, Francisco D.
    Rodriguez-Sanchez, Rafael
    Herrero, Jose R.
    Quintana-Orti, Enrique S.
    [J]. 2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2022), 2022, : 91 - 99
  • [3] NUMA-aware memory coloring for multicore real-time systems
    Pan, Xing
    Mueller, Frank
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 118
  • [4] Compact NUMA-aware Locks
    Dice, Dave
    Kogan, Alex
    [J]. PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,
  • [5] NUMA-Aware Task Performance Analysis
    Schmidl, Dirk
    Mueller, Matthias S.
    [J]. OpenMP: Memory, Devices, and Tasks, 2016, 9903 : 77 - 88
  • [6] Scalable Adaptive NUMA-Aware Lock
    Zhang, Mingzhe
    Chen, Haibo
    Cheng, Luwei
    Lau, Francis C. M.
    Wang, Cho-Li
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1754 - 1769
  • [7] A NUMA-Aware Recoverable Mutex Lock
    Fahmy, Ahmed
    Golab, Wojciech
    [J]. PROCEEDINGS OF THE 34TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, SPAA 2022, 2022, : 295 - 305
  • [8] A NUMA-Aware Recoverable Mutex Lock
    Fahmy, Ahmed
    Golab, Wojciech
    [J]. Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2022, : 295 - 305
  • [9] Beyond the Socket: NUMA-Aware GPUs
    Milic, Ugljesa
    Villa, Oreste
    Bolotin, Evgeny
    Arunkumar, Akhil
    Ebrahimi, Eiman
    Jaleel, Aamer
    Ramirez, Alex
    Nellans, David
    [J]. 50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 123 - 135
  • [10] Massively Parallel NUMA-Aware Hash Joins
    Lang, Harald
    Leis, Viktor
    Albutiu, Martina-Cezara
    Neumann, Thomas
    Kemper, Alfons
    [J]. IN MEMORY DATA MANAGEMENT AND ANALYSIS, 2015, 8921 : 3 - 14