Accelerating Sparse LU Factorization with Density-Aware Adaptive Matrix Multiplication for Circuit Simulation

被引：3

作者：

Wang, Tengcheng ^{[1
]}

Li, Wenhao ^{[1
]}

Pei, Haojie ^{[1
]}

Sun, Yuying ^{[1
]}

Jin, Zhou ^{[1
]}

Liu, Weifeng ^{[1
]}

机构：

[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

sparse LU factorization; circuit simulation; matrix multiplication; supernodal LU factorization; machine learning; random forest;

D O I：

10.1109/DAC56929.2023.10247767

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sparse LU factorization is considered to be one of the most time-consuming components in circuit simulation, particularly when dealing with circuits of considerable size in the advanced process era. Sparse LU factorization can be expedited by utilizing the supernode structure, which partitions the matrix into dense sub-matrices, thereby improving computational performance by utilizing level-3 Basic Linear Algebra Subprograms (BLAS) General Matrix Multiplication (GEMM) operations. The sparse and irregular structure of circuit matrices often impedes the formation of supernodes or results in the formation of supernodes with many zero elements, which in turn poses challenges for exploiting GEMM operations. In this paper, by fully utilizing the density in sub-matrices and combining GEMM with the Dense-Sparse Matrix Multiplication (SpMM), we propose a density-aware adaptive matrix multiplication equipped with machine learning techniques to optimize performance of the most-time consuming matrix multiplication operator so as to accelerate the sparse LU factorization. Numerical experiment results show that among the 6 circuit matrices tested, the average performance of matrix multiplication in our algorithm can be improved by 5.35x (up to 9.35x) compared to the performance of using GEMM directly in Schur-complement updates. Compared with state-of-the-art solver SuperLU_DIST, our method shows a substantial performance improvement.

引用

页数：6

共 50 条

[21] Parallel matrix multiplication and LU factorization on Ethernet-based clusters
Tinetti, FG
Denham, M
De Giusti, A
[J]. HIGH PERFORMANCE COMPUTING, 2003, 2858 : 431 - 439
[22] GLU3.0: Fast GPU-based Parallel Sparse LU Factorization for Circuit Simulation
Peng, Shaoyi
Tan, Sheldon X. -D.
[J]. IEEE DESIGN & TEST, 2020, 37 (03) : 78 - 90
[23] Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication
Junhong Liu
Xin He
Weifeng Liu
Guangming Tan
[J]. International Journal of Parallel Programming, 2019, 47 : 403 - 417
[24] Adaptive sparse matrix representation for efficient matrix–vector multiplication
Pantea Zardoshti
Farshad Khunjush
Hamid Sarbazi-Azad
[J]. The Journal of Supercomputing, 2016, 72 : 3366 - 3386
[25] Iterative Sparse Matrix-Vector Multiplication for Integer Factorization on GPUs
Schmidt, Bertil
Aribowo, Hans
Dang, Hoang-Vu
[J]. EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 413 - 424
[26] Algorithmic Advancements and a Comparative Investigation of Left and Right Looking Sparse LU Factorization on GPU Platform for Circuit Simulation
Lee, Wai-Kong
Achar, Ramachandra
[J]. IEEE ACCESS, 2022, 10 : 78993 - 79003
[27] Analysis and Optimization of Sparse LU factorization of Power Matrix Based on GLU
Shi, Cuncun
Lin, Long
Yang, He
Wang, Shucai
[J]. IET Conference Proceedings, 2023, 2023 (15): : 1073 - 1079
[28] Accelerating approximate matrix multiplication for near-sparse matrices on GPUs
Xiaoyan Liu
Yi Liu
Hailong Yang
Ming Dun
Bohong Yin
Zhongzhi Luan
Depei Qian
[J]. The Journal of Supercomputing, 2022, 78 : 11464 - 11491
[29] A PRACTICAL SCHEDULING ALGORITHM FOR PARALLEL LU FACTORIZATION IN CIRCUIT SIMULATION
CHEN, CC
HU, YH
[J]. 1989 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 1989, : 1788 - 1791
[30] Accelerating approximate matrix multiplication for near-sparse matrices on GPUs
Liu, Xiaoyan
Liu, Yi
Yang, Hailong
Dun, Ming
Yin, Bohong
Luan, Zhongzhi
Qian, Depei
[J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (09): : 11464 - 11491

← 1 2 3 4 5 →