Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads

被引:0
|
作者
Albert Segura
Jose Maria Arnau
Antonio Gonzalez
机构
[1] Universitat Politècnica de Catalunya (UPC),Departament d’Arquitectura de Computadors
来源
关键词
GPGPU; Graph processing; Parallel architectures; Computer architecture;
D O I
暂无
中图分类号
学科分类号
摘要
GPGPU architectures have become the dominant platform for massively parallel workloads, delivering high performance and energy efficiency for popular applications such as machine learning, computer vision or self-driving cars. However, irregular applications, such as graph processing, fail to fully exploit GPGPU resources due to their divergent memory accesses that saturate the memory hierarchy. To reduce the pressure on the memory subsystem for divergent memory-intensive applications, programmers must take into account SIMT execution model and memory coalescing in GPGPUs, devoting significant efforts in complex optimization techniques. Despite these efforts, we show that irregular graph processing still suffers from low GPGPU performance. We observe that in many irregular applications the mapping of data to threads can be safely changed. In other words, it is possible to relax the strict relationship between thread and data processed to reduce memory divergence. Based on this observation, we propose the Irregular accesses Reorder Unit (IRU), a novel hardware extension tightly integrated in the GPGPU pipeline. The IRU reorders data processed by the threads on irregular accesses to improve memory coalescing, i.e., it tries to assign data elements to threads as to produce coalesced accesses in SIMT groups. Furthermore, the IRU is capable of filtering and merging duplicated accesses, significantly reducing the workload. Programmers can easily utilize the IRU with a simple API, or let the compiler issue instructions from our extended ISA. We evaluate our proposal for state-of-the-art graph-based algorithms and a wide selection of applications. Results show that the IRU achieves a memory coalescing improvement of 1.32x and a 46% reduction in the overall traffic in the memory hierarchy, which results in 1.33x speedup and 13% energy savings on average, while incurring in a small 5.6% area overhead.
引用
收藏
页码:762 / 787
页数:25
相关论文
共 50 条
  • [1] Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
    Segura, Albert
    Arnau, Jose Maria
    Gonzalez, Antonio
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (01): : 762 - 787
  • [2] Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions
    Segura, Albert
    Arnau, Jose-Maria
    Gonzalez, Antonio
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1711 - 1723
  • [3] BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads
    Liu, Yuxi
    Zhao, Xia
    Yu, Zhibin
    Wang, Zhenlin
    Wang, Xiaolin
    Luo, Yingwei
    Eeckhout, Lieven
    [J]. 2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 633 - 640
  • [4] Lossless and Lossy Memory I/O Link Compression for Improving Performance of GPGPU Workloads
    Sathish, Vijay
    Schulte, Michael J.
    Kim, Nam Sung
    [J]. PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 325 - 334
  • [5] POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads
    Liu, Yuxi
    Zhao, Xia
    Yu, Zhibin
    Wang, Zhenlin
    Wang, Xiaolin
    Luo, Yingwei
    Eeckhout, Lieven
    [J]. 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 140 - 141
  • [6] A graph-based method for improving GSAT
    Kask, K
    Dechter, R
    [J]. PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 350 - 355
  • [7] Improving graph-based recommendation with unraveled graph learning
    Chang, Chih-Chieh
    Tzeng, Diing-Ruey
    Lu, Chia-Hsun
    Chang, Ming-Yi
    Shen, Chih-Ya
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (04) : 2440 - 2465
  • [8] Improving the graph-based image segmentation method
    Zhang, Ming
    Alhajj, Reda
    [J]. ICTAI-2006: EIGHTEENTH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, : 617 - +
  • [9] Graph-Based, Supervised Machine Learning Approach to (Irregular) Polysemy in WordNet
    Entrup, Bastian
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2014, 8686 : 84 - 91
  • [10] Improving Weights for Graph-Based Image Fragment Reassembly
    Wu, Xianyan
    Han, Qi
    Niu, Xiamu
    [J]. 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP), 2015, : 219 - 222