Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU

被引：0

作者：

刘力 ^{[1
]}

LiuLi ^{[1
]}

Yang Guang wen ^{[1
]}

机构：

[1] Department of Computer Science and Technology,Tsinghua University

来源：

High Technology Letters | 2013年 / 19卷 / 04期

关键词：

sparse matrix multiplication; cache miss; scalability; multi-core CPU; GPU;

D O I：

暂无

中图分类号：

TP301.6 [算法理论];

学科分类号：

081202 ;

摘要：

This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.

引用

下载

页码：339 / 345

页数：7

共 50 条

[1] Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix-vector multiplication
Trotter, James D.
Langguth, Johannes
Cai, Xing
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 189 - 205
[2] Performance Optimization by Dynamically Altering Cache Replacement Algorithm in CPU-GPU Heterogeneous Multi-Core Architecture
Fang, Juan
Fan, Qingwen
Hao, Xiaoting
Cheng, Yanjin
Sun, Lijun
2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 723 - +
[3] Scaling Sparse Matrix Multiplication on CPU-GPU Nodes
Xia, Yang
Jiang, Peng
Agrawal, Gagan
Ramnath, Rajiv
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 392 - 401
[4] Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs
Yin, Shangfei
Wang, Qinglin
Hao, Ruochen
Zhou, Tianyang
Mei, Songzhu
Liu, Jie
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 451 - 461
[5] Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures
Oryspayev, Dossay
Aktulga, Hasan Metin
Sosonkina, Masha
Maris, Pieter
Vary, James P.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5019 - 5036
[6] Performance Analysis of LiDAR Data Processing on Multi-Core CPU and GPU Architectures
Alzyout, Mohammad S.
Al Nounou, Abd Alrahman
Tikkisetty, Yashwanth Naidu
Alawneh, Shadi
2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
[7] Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors
Elafrou, Athena
Goumas, Georgios
Koziris, Nectarios
2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 292 - 301
[8] Cooperative, collaborative, coevolutionary multi-objective optimization on CPU-GPU multi-core
Zhuoran Sun
Ying Ying Liu
Parimala Thulasiraman
Thulasiraman, Parimala (Parimala.Thulasiraman@umanitoba.ca), 2025, 81 (01):
[9] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
Liu, Weifeng
Vinter, Brian
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[10] Acceleration of Stereo-Matching on Multi-core CPU and GPU
Xu, Tian
Cockshott, Paul
Oehler, Susanne
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 108 - 115

← 1 2 3 4 5 →