swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

被引:0
|
作者
LIU Xiaoyan [1 ,2 ]
LIU Yi [2 ]
YIN Bohong [2 ]
YANG Hailong [1 ,2 ]
LUAN Zhongzhi [2 ]
QIAN Depei [2 ]
机构
[1] State Key Laboratory of Software Development Environment, Beijing , China
[2] School of Computer Science and Engineering, Beihang University, Beijing ,
关键词
approximate calculation; sunway processor; performance optimization;
D O I
暂无
中图分类号
TP338.4 [大型、巨型计算机]; TP332 [运算器和控制器(CPU)]; TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although matrix multiplication plays an essential role in a wide range of applications, previous works only focus on optimizing dense or sparse matrix multiplications. The Sparse Approximate Matrix Multiply (SpAMM) is an algorithm to accelerate the multiplication of decay matrices, the sparsity of which is between dense and sparse matrices. In addition, large-scale decay matrix multiplication is performed in scientific applications to solve cutting-edge problems. To optimize large-scale decay matrix multiplication using SpAMM on supercomputers such as Sunway Taihulight, we present swSpAMM, an optimized SpAMM algorithm by adapting the computation characteristics to the architecture features of Sunway Taihulight.Specifically, we propose both intra-node and inter-node optimizations to accelerate swSpAMM for large-scale execution. For intra-node optimizations, we explore algorithm parallelization and block-major data layout that are tailored to better utilize the architecture advantage of Sunway processor. For inter-node optimizations, we propose a matrix organization strategy for better distributing sub-matrices across nodes and a dynamic scheduling strategy for improving load balance across nodes. We compare swSpAMM with the existing GEMM library on a single node as well as large-scale matrix multiplication methods on multiple nodes. The experiment results show that swSpAMM achieves a speedup up to 14.5× and 2.2× when compared to xMath library on a single node and 2D GEMM method on multiple nodes, respectively.
引用
收藏
相关论文
共 50 条
  • [31] Fast Compressive Large-Scale Matrix-Matrix Multiplication Using Product Codes
    Ocal, Orhan
    Ramchandran, Kannan
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1426 - 1431
  • [32] Sparse approximate matrix-matrix multiplication for density matrix purification with error control
    Artemov, Anton G.
    Rubensson, Emanuel H.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 438
  • [33] On optimality of approximate low rank solutions of large-scale matrix equations
    Benner, Peter
    Breiten, Tobias
    SYSTEMS & CONTROL LETTERS, 2014, 67 : 55 - 64
  • [34] On Distributed Multiplication of Large-Scale Matrices
    Glushan, V. M.
    Lozovoy, A. Yu
    2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
  • [35] Selective Inversion of Inductance Matrix for Large-Scale Sparse RLC Simulation
    Apostolopoulou, Ifigeneia
    Daloukas, Konstantis
    Evmorfopoulos, Nestor
    Stamoulis, George
    2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
  • [36] A novel publicly delegable secure outsourcing algorithm for large-scale matrix multiplication
    Kumar, Malay
    Mishra, Vaibhav
    Shukla, Anurag
    Singh, Munendra
    Vardhan, Manu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 6445 - 6455
  • [37] MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA
    Huang, You
    Shen, Junzhong
    Qiao, Yuran
    Wen, Mei
    Zhang, Chunyuan
    IEICE ELECTRONICS EXPRESS, 2018, 15 (10):
  • [38] Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication
    Pradhan, Asit Kumar
    Heidarzadeh, Anoosheh
    Narayanan, Krishna R.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 239 - 244
  • [39] Efficient Processing of Large-Scale Sparse Matrix-Matrix Multiplications on a Single Machine
    Jo, Yong-Yeon
    Lee, Kyuhwan
    Jang, Myung-Hwan
    Kim, Sang-Wook
    Song, Eunjee
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1908 - 1913
  • [40] Accelerating approximate matrix multiplication for near-sparse matrices on GPUs
    Xiaoyan Liu
    Yi Liu
    Hailong Yang
    Ming Dun
    Bohong Yin
    Zhongzhi Luan
    Depei Qian
    The Journal of Supercomputing, 2022, 78 : 11464 - 11491