swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight

被引:0
|
作者
LIU Xiaoyan [1 ,2 ]
LIU Yi [2 ]
YIN Bohong [2 ]
YANG Hailong [1 ,2 ]
LUAN Zhongzhi [2 ]
QIAN Depei [2 ]
机构
[1] State Key Laboratory of Software Development Environment, Beijing , China
[2] School of Computer Science and Engineering, Beihang University, Beijing ,
关键词
approximate calculation; sunway processor; performance optimization;
D O I
暂无
中图分类号
TP338.4 [大型、巨型计算机]; TP332 [运算器和控制器(CPU)]; TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although matrix multiplication plays an essential role in a wide range of applications, previous works only focus on optimizing dense or sparse matrix multiplications. The Sparse Approximate Matrix Multiply (SpAMM) is an algorithm to accelerate the multiplication of decay matrices, the sparsity of which is between dense and sparse matrices. In addition, large-scale decay matrix multiplication is performed in scientific applications to solve cutting-edge problems. To optimize large-scale decay matrix multiplication using SpAMM on supercomputers such as Sunway Taihulight, we present swSpAMM, an optimized SpAMM algorithm by adapting the computation characteristics to the architecture features of Sunway Taihulight.Specifically, we propose both intra-node and inter-node optimizations to accelerate swSpAMM for large-scale execution. For intra-node optimizations, we explore algorithm parallelization and block-major data layout that are tailored to better utilize the architecture advantage of Sunway processor. For inter-node optimizations, we propose a matrix organization strategy for better distributing sub-matrices across nodes and a dynamic scheduling strategy for improving load balance across nodes. We compare swSpAMM with the existing GEMM library on a single node as well as large-scale matrix multiplication methods on multiple nodes. The experiment results show that swSpAMM achieves a speedup up to 14.5× and 2.2× when compared to xMath library on a single node and 2D GEMM method on multiple nodes, respectively.
引用
收藏
相关论文
共 50 条
  • [1] swSpAMM: optimizing large-scale sparse approximate matrix multiplication on Sunway Taihulight
    Liu, Xiaoyan
    Liu, Yi
    Yin, Bohong
    Yang, Hailong
    Luan, Zhongzhi
    Qian, Depei
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (04)
  • [2] Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
    Chen, Yuedan
    Li, Kenli
    Yang, Wangdong
    Xiao, Guoqing
    Xie, Xianghui
    Li, Tao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (04) : 923 - 938
  • [3] Large-scale Simulations of Peridynamics on Sunway Taihulight Supercomputer
    Li, Xinyuan
    Ye, Huang
    Zhang, Jian
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [4] Enabling Large-Scale Simulation of CAM on the Sunway TaihuLight Supercomputer
    Li, Yuxuan
    Duan, Xiaohui
    Gan, Lin
    Wan, Wubing
    Chen, Yuhu
    Xu, Kai
    Yang, Jinzhe
    Liu, Weiguo
    Xue, Wei
    Fu, Haohuan
    Yang, Guangwen
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (04) : 824 - 837
  • [5] swFLOW: A large-scale distributed framework for deep learning on Sunway TaihuLight supercomputer
    Li, Mingfan
    Lin, Han
    Chen, Junshi
    Diaz, Jose Monsalve
    Xiao, Qian
    Lin, Rongfen
    Wang, Fei
    Gao, Guang R.
    An, Hong
    INFORMATION SCIENCES, 2021, 570 (570) : 831 - 847
  • [6] A dynamic agricultural prediction system for large-scale drought assessment on the Sunway TaihuLight supercomputer
    Huang, Xiao
    Yu, Chaoqing
    Fang, Jiarui
    Huang, Guorui
    Ni, Shaoqiang
    Hall, Jim
    Zorn, Conrad
    Huang, Xiaomeng
    Zhang, Wenyuan
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2018, 154 : 400 - 410
  • [7] Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems
    Acer, Seher
    Selvitopi, Oguz
    Aykanat, Cevdet
    PARALLEL COMPUTING, 2016, 59 : 71 - 96
  • [8] Sparstition: A Partitioning Scheme for Large-Scale Sparse Matrix Vector Multiplication on FPGA
    Sigurbergsson, Bjorn
    Hogervorst, Tom
    Tong Dong Qiu
    Nane, Razvan
    2019 IEEE 30TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2019), 2019, : 51 - 58
  • [9] Bio-ESMD: A Data Centric Implementation for Large-Scale Biological System Simulation on Sunway TaihuLight Supercomputer
    Duan, Xiaohui
    Shao, Qi
    Weng, Junben
    Schmidt, Bertil
    Gan, Lin
    Li, Guohui
    Fu, Haohuan
    Xue, Wei
    Liu, Weiguo
    Yang, Guangwen
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (03) : 881 - 893
  • [10] On Large-Scale Matrix-Matrix Multiplication On Compressed Structures
    Krishna, Sudhindra Gopal
    Narasimhan, Aditya
    Radhakrishnan, Sridhar
    Veras, Richard
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2976 - 2985