Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads

被引：0

作者：

Albert Segura

Jose Maria Arnau

Antonio Gonzalez

机构：

[1] Universitat Politècnica de Catalunya (UPC),Departament d’Arquitectura de Computadors

来源：

The Journal of Supercomputing | 2023年 / 79卷

关键词：

GPGPU; Graph processing; Parallel architectures; Computer architecture;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

GPGPU architectures have become the dominant platform for massively parallel workloads, delivering high performance and energy efficiency for popular applications such as machine learning, computer vision or self-driving cars. However, irregular applications, such as graph processing, fail to fully exploit GPGPU resources due to their divergent memory accesses that saturate the memory hierarchy. To reduce the pressure on the memory subsystem for divergent memory-intensive applications, programmers must take into account SIMT execution model and memory coalescing in GPGPUs, devoting significant efforts in complex optimization techniques. Despite these efforts, we show that irregular graph processing still suffers from low GPGPU performance. We observe that in many irregular applications the mapping of data to threads can be safely changed. In other words, it is possible to relax the strict relationship between thread and data processed to reduce memory divergence. Based on this observation, we propose the Irregular accesses Reorder Unit (IRU), a novel hardware extension tightly integrated in the GPGPU pipeline. The IRU reorders data processed by the threads on irregular accesses to improve memory coalescing, i.e., it tries to assign data elements to threads as to produce coalesced accesses in SIMT groups. Furthermore, the IRU is capable of filtering and merging duplicated accesses, significantly reducing the workload. Programmers can easily utilize the IRU with a simple API, or let the compiler issue instructions from our extended ISA. We evaluate our proposal for state-of-the-art graph-based algorithms and a wide selection of applications. Results show that the IRU achieves a memory coalescing improvement of 1.32x and a 46% reduction in the overall traffic in the memory hierarchy, which results in 1.33x speedup and 13% energy savings on average, while incurring in a small 5.6% area overhead.

引用

页码：762 / 787

页数：25

共 50 条

[1] Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
Segura, Albert
Arnau, Jose Maria
Gonzalez, Antonio
[J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (01): : 762 - 787
[2] Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions
Segura, Albert
Arnau, Jose-Maria
Gonzalez, Antonio
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1711 - 1723
[3] BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads
Liu, Yuxi
Zhao, Xia
Yu, Zhibin
Wang, Zhenlin
Wang, Xiaolin
Luo, Yingwei
Eeckhout, Lieven
[J]. 2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2017, : 633 - 640
[4] Lossless and Lossy Memory I/O Link Compression for Improving Performance of GPGPU Workloads
Sathish, Vijay
Schulte, Michael J.
Kim, Nam Sung
[J]. PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 325 - 334
[5] POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads
Liu, Yuxi
Zhao, Xia
Yu, Zhibin
Wang, Zhenlin
Wang, Xiaolin
Luo, Yingwei
Eeckhout, Lieven
[J]. 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 140 - 141
[6] A graph-based method for improving GSAT
Kask, K
Dechter, R
[J]. PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 350 - 355
[7] Improving graph-based recommendation with unraveled graph learning
Chang, Chih-Chieh
Tzeng, Diing-Ruey
Lu, Chia-Hsun
Chang, Ming-Yi
Shen, Chih-Ya
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (04) : 2440 - 2465
[8] Improving the graph-based image segmentation method
Zhang, Ming
Alhajj, Reda
[J]. ICTAI-2006: EIGHTEENTH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, : 617 - +
[9] Graph-Based, Supervised Machine Learning Approach to (Irregular) Polysemy in WordNet
Entrup, Bastian
[J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2014, 8686 : 84 - 91
[10] Improving Weights for Graph-Based Image Fragment Reassembly
Wu, Xianyan
Han, Qi
Niu, Xiamu
[J]. 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP), 2015, : 219 - 222

← 1 2 3 4 5 →