swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

被引：0

作者：

LIU Xiaoyan ^{[1
,2
]}

LIU Yi ^{[2
]}

YIN Bohong ^{[2
]}

YANG Hailong ^{[1
,2
]}

LUAN Zhongzhi ^{[2
]}

QIAN Depei ^{[2
]}

机构：

[1] State Key Laboratory of Software Development Environment, Beijing , China

[2] School of Computer Science and Engineering, Beihang University, Beijing ,

来源：

Frontiers of Computer Science | 2023年 / 17卷 / 04期

关键词：

approximate calculation; sunway processor; performance optimization;

D O I：

暂无

中图分类号：

TP338.4 [大型、巨型计算机]; TP332 [运算器和控制器（CPU）]; TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although matrix multiplication plays an essential role in a wide range of applications, previous works only focus on optimizing dense or sparse matrix multiplications. The Sparse Approximate Matrix Multiply (SpAMM) is an algorithm to accelerate the multiplication of decay matrices, the sparsity of which is between dense and sparse matrices. In addition, large-scale decay matrix multiplication is performed in scientific applications to solve cutting-edge problems. To optimize large-scale decay matrix multiplication using SpAMM on supercomputers such as Sunway Taihulight, we present swSpAMM, an optimized SpAMM algorithm by adapting the computation characteristics to the architecture features of Sunway Taihulight.Specifically, we propose both intra-node and inter-node optimizations to accelerate swSpAMM for large-scale execution. For intra-node optimizations, we explore algorithm parallelization and block-major data layout that are tailored to better utilize the architecture advantage of Sunway processor. For inter-node optimizations, we propose a matrix organization strategy for better distributing sub-matrices across nodes and a dynamic scheduling strategy for improving load balance across nodes. We compare swSpAMM with the existing GEMM library on a single node as well as large-scale matrix multiplication methods on multiple nodes. The experiment results show that swSpAMM achieves a speedup up to 14.5× and 2.2× when compared to xMath library on a single node and 2D GEMM method on multiple nodes, respectively.

引用

共 50 条

[1] swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight
Liu, Xiaoyan
Liu, Yi
Yin, Bohong
Yang, Hailong
Luan, Zhongzhi
Qian, Depei
FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (04)
[2] Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
Chen, Yuedan
Li, Kenli
Yang, Wangdong
Xiao, Guoqing
Xie, Xianghui
Li, Tao
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (04) : 923 - 938
[3] Large-scale Simulations of Peridynamics on Sunway Taihulight Supercomputer
Li, Xinyuan
Ye, Huang
Zhang, Jian
PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
[4] Enabling Large-Scale Simulation of CAM on the Sunway TaihuLight Supercomputer
Li, Yuxuan
Duan, Xiaohui
Gan, Lin
Wan, Wubing
Chen, Yuhu
Xu, Kai
Yang, Jinzhe
Liu, Weiguo
Xue, Wei
Fu, Haohuan
Yang, Guangwen
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (04) : 824 - 837
[5] swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer
Li, Mingfan
Lin, Han
Chen, Junshi
Diaz, Jose Monsalve
Xiao, Qian
Lin, Rongfen
Wang, Fei
Gao, Guang R.
An, Hong
INFORMATION SCIENCES, 2021, 570 (570) : 831 - 847
[6] A dynamic agricultural prediction system for large-scale drought assessment on the Sunway TaihuLight supercomputer
Huang, Xiao
Yu, Chaoqing
Fang, Jiarui
Huang, Guorui
Ni, Shaoqiang
Hall, Jim
Zorn, Conrad
Huang, Xiaomeng
Zhang, Wenyuan
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2018, 154 : 400 - 410
[7] Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems
Acer, Seher
Selvitopi, Oguz
Aykanat, Cevdet
PARALLEL COMPUTING, 2016, 59 : 71 - 96
[8] Sparstition: A Partitioning Scheme for Large-Scale Sparse Matrix Vector Multiplication on FPGA
Sigurbergsson, Bjorn
Hogervorst, Tom
Tong Dong Qiu
Nane, Razvan
2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019), 2019, : 51 - 58
[9] Bio-ESMD: A Data Centric Implementation for Large-Scale Biological System Simulation on Sunway TaihuLight Supercomputer
Duan, Xiaohui
Shao, Qi
Weng, Junben
Schmidt, Bertil
Gan, Lin
Li, Guohui
Fu, Haohuan
Xue, Wei
Liu, Weiguo
Yang, Guangwen
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (03) : 881 - 893
[10] On Large-Scale Matrix-Matrix Multiplication On Compressed Structures
Krishna, Sudhindra Gopal
Narasimhan, Aditya
Radhakrishnan, Sridhar
Veras, Richard
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2976 - 2985

← 1 2 3 4 5 →