A data locality methodology for matrix–matrix multiplication algorithm

被引:0
|
作者
Nicolaos Alachiotis
Vasileios I. Kelefouras
George S. Athanasiou
Harris E. Michail
Angeliki S. Kritikakou
Costas E. Goutis
机构
[1] University of Patras,VLSI Design Lab., Electrical & Computer Engineering Department
来源
关键词
Compilers; Memory management; Data locality; Data reuse; Recursive array layouts; Scheduling; Strassen’s algorithm; Matrix-matrix multiplication;
D O I
暂无
中图分类号
学科分类号
摘要
Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the performance of its implementations depends on the memory utilization and data locality. There are MMM algorithms, such as standard, Strassen–Winograd variant, and many recursive array layouts, such as Z-Morton or U-Morton. However, their data locality is lower than that of the proposed methodology. Moreover, several SOA (state of the art) self-tuning libraries exist, such as ATLAS for MMM algorithm, which tests many MMM implementations. During the installation of ATLAS, on the one hand an extremely complex empirical tuning step is required, and on the other hand a large number of compiler options are used, both of which are not included in the scope of this paper. In this paper, a new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. Compared with (Chatterjee et al. in IEEE Trans. Parallel Distrib. Syst. 13:1105, 2002; Li and Garzaran in Proc. of Lang. Compil. Parallel Comput., 2005; Bilmes et al. in Proc. of the 11th ACM Int. Conf. Super-comput., 1997; Aberdeen and Baxter in Concurr. Comput. Pract. Exp. 13:103, 2001), the proposed methodology has two major advantages. Firstly, the scheduling used for the tile level is different from the element level’s one, having better data locality, suited to the sizes of memory hierarchy. Secondly, its exploration time is short, because it searches only for the number of the level of tiling used, and between (1, 2) (Sect. 4) for finding the best tile size for each cache level. A software tool (C-code) implementing the above methodology was developed, having the hardware model and the matrix sizes as input. This methodology has better performance against others at a wide range of architectures. Compared with the best existing related work, which we implemented, better performance up to 55% than the Standard MMM algorithm and up to 35% than Strassen’s is observed, both under recursive data array layouts.
引用
收藏
页码:830 / 851
页数:21
相关论文
共 50 条
  • [21] Data Preservation by hash algorithm for matrix multiplication over venomous cloud
    Goel, Gaurav
    Tiwari, Rajeev
    Rishiwal, Vinay
    Upadhyay, Shuchi
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 210 - 214
  • [22] A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures
    Vasilios Kelefouras
    A. Kritikakou
    Iosif Mporas
    Vasilios Kolonias
    The Journal of Supercomputing, 2016, 72 : 804 - 844
  • [23] Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm
    Ohene-Kwofie, Daniel
    Hazelhurst, Scott
    SOUTH AFRICAN COMPUTER SCIENCE AND INFORMATION SYSTEMS RESEARCH TRENDS, SAICSIT 2024, 2024, 2159 : 190 - 206
  • [24] Parallel Algorithm for Quasi-Band Matrix-Matrix Multiplication
    Vooturi, Dharma Teja
    Kothapalli, Kishore
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, 2016, 9573 : 106 - 115
  • [25] A New Fast Recursive Matrix Multiplication Algorithm
    L. D. Jelfimova
    Cybernetics and Systems Analysis, 2019, 55 : 547 - 551
  • [26] An improved combinatorial algorithm for Boolean matrix multiplication
    Yu, Huacheng
    INFORMATION AND COMPUTATION, 2018, 261 : 240 - 247
  • [27] An Improved Combinatorial Algorithm for Boolean Matrix Multiplication
    Yu, Huacheng
    AUTOMATA, LANGUAGES, AND PROGRAMMING, PT I, 2015, 9134 : 1094 - 1105
  • [28] Adaptive Flip Graph Algorithm for Matrix Multiplication
    Arai, Yamato
    Ichikawa, Yuma
    Hukushima, Koji
    PROCEEDINGS OF THE 2024 INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND ALGEBRAIC COMPUTATION, ISSAC 2024, 2024, : 292 - 298
  • [29] A New Fast Recursive Matrix Multiplication Algorithm
    Jelfimova, L. D.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2019, 55 (04) : 547 - 551
  • [30] IMPROVED ALGORITHM FOR BOOLEAN MATRIX MULTIPLICATION.
    Santoro, N.
    Urrutia, J.
    Computing (Vienna/New York), 1986, 36 (04): : 375 - 382