Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU

被引:0
|
作者
刘力 [1 ]
LiuLi [1 ]
Yang Guang wen [1 ]
机构
[1] Department of Computer Science and Technology,Tsinghua University
关键词
sparse matrix multiplication; cache miss; scalability; multi-core CPU; GPU;
D O I
暂无
中图分类号
TP301.6 [算法理论];
学科分类号
081202 ;
摘要
This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.
引用
下载
收藏
页码:339 / 345
页数:7
相关论文
共 50 条
  • [1] Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix-vector multiplication
    Trotter, James D.
    Langguth, Johannes
    Cai, Xing
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 189 - 205
  • [2] Performance Optimization by Dynamically Altering Cache Replacement Algorithm in CPU-GPU Heterogeneous Multi-Core Architecture
    Fang, Juan
    Fan, Qingwen
    Hao, Xiaoting
    Cheng, Yanjin
    Sun, Lijun
    2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 723 - +
  • [3] Scaling Sparse Matrix Multiplication on CPU-GPU Nodes
    Xia, Yang
    Jiang, Peng
    Agrawal, Gagan
    Ramnath, Rajiv
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 392 - 401
  • [4] Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs
    Yin, Shangfei
    Wang, Qinglin
    Hao, Ruochen
    Zhou, Tianyang
    Mei, Songzhu
    Liu, Jie
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 451 - 461
  • [5] Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures
    Oryspayev, Dossay
    Aktulga, Hasan Metin
    Sosonkina, Masha
    Maris, Pieter
    Vary, James P.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5019 - 5036
  • [6] Performance Analysis of LiDAR Data Processing on Multi-Core CPU and GPU Architectures
    Alzyout, Mohammad S.
    Al Nounou, Abd Alrahman
    Tikkisetty, Yashwanth Naidu
    Alawneh, Shadi
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [7] Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors
    Elafrou, Athena
    Goumas, Georgios
    Koziris, Nectarios
    2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2017, : 292 - 301
  • [8] Cooperative, collaborative, coevolutionary multi-objective optimization on CPU-GPU multi-core
    Zhuoran Sun
    Ying Ying Liu
    Parimala Thulasiraman
    Thulasiraman, Parimala (Parimala.Thulasiraman@umanitoba.ca), 2025, 81 (01):
  • [9] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
    Liu, Weifeng
    Vinter, Brian
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [10] Acceleration of Stereo-Matching on Multi-core CPU and GPU
    Xu, Tian
    Cockshott, Paul
    Oehler, Susanne
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 108 - 115