A data locality methodology for matrix–matrix multiplication algorithm

被引：0

作者：

Nicolaos Alachiotis

Vasileios I. Kelefouras

George S. Athanasiou

Harris E. Michail

Angeliki S. Kritikakou

Costas E. Goutis

机构：

[1] University of Patras,VLSI Design Lab., Electrical & Computer Engineering Department

来源：

The Journal of Supercomputing | 2012年 / 59卷

关键词：

Compilers; Memory management; Data locality; Data reuse; Recursive array layouts; Scheduling; Strassen’s algorithm; Matrix-matrix multiplication;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the performance of its implementations depends on the memory utilization and data locality. There are MMM algorithms, such as standard, Strassen–Winograd variant, and many recursive array layouts, such as Z-Morton or U-Morton. However, their data locality is lower than that of the proposed methodology. Moreover, several SOA (state of the art) self-tuning libraries exist, such as ATLAS for MMM algorithm, which tests many MMM implementations. During the installation of ATLAS, on the one hand an extremely complex empirical tuning step is required, and on the other hand a large number of compiler options are used, both of which are not included in the scope of this paper. In this paper, a new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. Compared with (Chatterjee et al. in IEEE Trans. Parallel Distrib. Syst. 13:1105, 2002; Li and Garzaran in Proc. of Lang. Compil. Parallel Comput., 2005; Bilmes et al. in Proc. of the 11th ACM Int. Conf. Super-comput., 1997; Aberdeen and Baxter in Concurr. Comput. Pract. Exp. 13:103, 2001), the proposed methodology has two major advantages. Firstly, the scheduling used for the tile level is different from the element level’s one, having better data locality, suited to the sizes of memory hierarchy. Secondly, its exploration time is short, because it searches only for the number of the level of tiling used, and between (1, 2) (Sect. 4) for finding the best tile size for each cache level. A software tool (C-code) implementing the above methodology was developed, having the hardware model and the matrix sizes as input. This methodology has better performance against others at a wide range of architectures. Compared with the best existing related work, which we implemented, better performance up to 55% than the Standard MMM algorithm and up to 35% than Strassen’s is observed, both under recursive data array layouts.

引用

页码：830 / 851

页数：21

共 50 条

[41] A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication
Rasouli, Majid
Kirby, Robert M.
Sundar, Hari
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2021), 2020, : 110 - 119
[42] Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs
Yu, Zhongming
Dai, Guohao
Huang, Guyue
Wang, Yu
Yang, Huazhong
2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, : 567 - 574
[43] A Matrix-Matrix Multiplication methodology for single/multi-core architectures using SIMD
Kelefouras, Vasilios
Kritikakou, Angeliki
Goutis, Costas
JOURNAL OF SUPERCOMPUTING, 2014, 68 (03): : 1418 - 1440
[44] Algorithms for Matrix Multiplication via Sampling and Opportunistic Matrix Multiplication
Harris, David G.
ALGORITHMICA, 2024, 86 (09) : 2822 - 2844
[45] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
Liu, Weifeng
Vinter, Brian
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[46] Fast Matrix Multiplication Algorithm for a Bank of Digital Filters
Kreyndelin, V. B.
Grigorieva, E. D.
2021 SYSTEMS OF SIGNAL SYNCHRONIZATION, GENERATING AND PROCESSING IN TELECOMMUNICATIONS (SYNCHROINFO), 2021,
[47] Matrix Multiplication: Practical Use of a Strassen Like Algorithm
Rozman, Mitja
Elersic, Miha
IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2019, 15 (01):
[48] A NOTE ON A FAST ALGORITHM FOR SPARSE-MATRIX MULTIPLICATION
COHEN, J
INFORMATION PROCESSING LETTERS, 1983, 16 (05) : 247 - 248
[49] General parallel algorithm of matrix multiplication on the biswapped network
Cai, Zhaoquan
Wei, Wenhong
Journal of Information and Computational Science, 2009, 6 (04): : 1737 - 1742
[50] The Mailman algorithm: A note on matrix-vector multiplication
Liberty, Edo
Zucker, Steven W.
INFORMATION PROCESSING LETTERS, 2009, 109 (03) : 179 - 182

← 1 2 3 4 5 →