swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

被引：0

作者：

LIU Xiaoyan ^{[1
,2
]}

LIU Yi ^{[2
]}

YIN Bohong ^{[2
]}

YANG Hailong ^{[1
,2
]}

LUAN Zhongzhi ^{[2
]}

QIAN Depei ^{[2
]}

机构：

[1] State Key Laboratory of Software Development Environment, Beijing , China

[2] School of Computer Science and Engineering, Beihang University, Beijing ,

来源：

Frontiers of Computer Science | 2023年 / 17卷 / 04期

关键词：

approximate calculation; sunway processor; performance optimization;

D O I：

暂无

中图分类号：

TP338.4 [大型、巨型计算机]; TP332 [运算器和控制器（CPU）]; TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although matrix multiplication plays an essential role in a wide range of applications, previous works only focus on optimizing dense or sparse matrix multiplications. The Sparse Approximate Matrix Multiply (SpAMM) is an algorithm to accelerate the multiplication of decay matrices, the sparsity of which is between dense and sparse matrices. In addition, large-scale decay matrix multiplication is performed in scientific applications to solve cutting-edge problems. To optimize large-scale decay matrix multiplication using SpAMM on supercomputers such as Sunway Taihulight, we present swSpAMM, an optimized SpAMM algorithm by adapting the computation characteristics to the architecture features of Sunway Taihulight.Specifically, we propose both intra-node and inter-node optimizations to accelerate swSpAMM for large-scale execution. For intra-node optimizations, we explore algorithm parallelization and block-major data layout that are tailored to better utilize the architecture advantage of Sunway processor. For inter-node optimizations, we propose a matrix organization strategy for better distributing sub-matrices across nodes and a dynamic scheduling strategy for improving load balance across nodes. We compare swSpAMM with the existing GEMM library on a single node as well as large-scale matrix multiplication methods on multiple nodes. The experiment results show that swSpAMM achieves a speedup up to 14.5× and 2.2× when compared to xMath library on a single node and 2D GEMM method on multiple nodes, respectively.

引用

共 50 条

[31] Fast Compressive Large-Scale Matrix-Matrix Multiplication Using Product Codes
Ocal, Orhan
Ramchandran, Kannan
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1426 - 1431
[32] Sparse approximate matrix-matrix multiplication for density matrix purification with error control
Artemov, Anton G.
Rubensson, Emanuel H.
JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 438
[33] On optimality of approximate low rank solutions of large-scale matrix equations
Benner, Peter
Breiten, Tobias
SYSTEMS & CONTROL LETTERS, 2014, 67 : 55 - 64
[34] On Distributed Multiplication of Large-Scale Matrices
Glushan, V. M.
Lozovoy, A. Yu
2021 IEEE 15TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2021), 2021,
[35] Selective Inversion of Inductance Matrix for Large-Scale Sparse RLC Simulation
Apostolopoulou, Ifigeneia
Daloukas, Konstantis
Evmorfopoulos, Nestor
Stamoulis, George
2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
[36] A novel publicly delegable secure outsourcing algorithm for large-scale matrix multiplication
Kumar, Malay
Mishra, Vaibhav
Shukla, Anurag
Singh, Munendra
Vardhan, Manu
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 6445 - 6455
[37] MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA
Huang, You
Shen, Junzhong
Qiao, Yuran
Wen, Mei
Zhang, Chunyuan
IEICE ELECTRONICS EXPRESS, 2018, 15 (10):
[38] Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication
Pradhan, Asit Kumar
Heidarzadeh, Anoosheh
Narayanan, Krishna R.
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 239 - 244
[39] Efficient Processing of Large-Scale Sparse Matrix-Matrix Multiplications on a Single Machine
Jo, Yong-Yeon
Lee, Kyuhwan
Jang, Myung-Hwan
Kim, Sang-Wook
Song, Eunjee
2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1908 - 1913
[40] Accelerating approximate matrix multiplication for near-sparse matrices on GPUs
Xiaoyan Liu
Yi Liu
Hailong Yang
Ming Dun
Bohong Yin
Zhongzhi Luan
Depei Qian
The Journal of Supercomputing, 2022, 78 : 11464 - 11491

← 1 2 3 4 5 →