Optimizing non-coalesced memory access for irregular applications with GPU computing

被引：0

作者：

Ran Zheng

Yuan-dong Liu

Hai Jin

机构：

[1] Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System

[2] Huazhong University of Science and Technology,Services Computing Technology and System Lab

[3] Huazhong University of Science and Technology,Cluster and Grid Computing Lab

[4] Huazhong University of Science and Technology,School of Computer Science and Technology

来源：

Frontiers of Information Technology & Electronic Engineering | 2020年 / 21卷

关键词：

General purpose graphics processing units; Memory coalescing; Non-coalesced memory access; Data reordering; TP319;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

General purpose graphics processing units (GPGPUs) can be used to improve computing performance considerably for regular applications. However, irregular memory access exists in many applications, and the benefits of graphics processing units (GPUs) are less substantial for irregular applications. In recent years, several studies have presented some solutions to remove static irregular memory access. However, eliminating dynamic irregular memory access with software remains a serious challenge. A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access, especially for indirect memory access. Data reordering and index redirection are suggested to reduce the number of memory transactions, thereby improving the performance of GPU kernels. To improve the efficiency of data reordering, an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data. Through concurrently executing the compute unified device architecture (CUDA) streams of data reordering and the data processing kernel, the overhead of data reordering can be reduced. After these optimizations, the volume of memory transactions can be reduced by 16.7%–50% compared with CUSPARSE-based benchmarks, and the performance of irregular kernels can be improved by 9.64%–34.9% using an NVIDIA Tesla P4 GPU.

引用

页码：1285 / 1301

页数：16

共 35 条

[1] Optimizing non-coalesced memory access for irregular applications with GPU computing
Zheng, Ran
Liu, Yuan-dong
Jin, Hai
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (09) : 1285 - 1301
[2] Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced Memory Accesses on GPU
Wu, Bo
Zhao, Zhijia
Zhang, Eddy Z.
Jiang, Yunlian
Shen, Xipeng
ACM SIGPLAN NOTICES, 2013, 48 (08) : 57 - 67
[3] Non-coalesced Access Patterns of Global Memory Load Transactions in Metropolis Resampling Implemented on Graphics Processing Unit
Dulger, Ozcan
Oguztuzun, Halit
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[4] An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns
Li, Bingchao
Wei, Jizeng
Sun, Jizhou
Annavaram, Murali
Kim, Nam Sung
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (03) : 1 - 24
[5] A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access
Ahn, Heungseop
Choi, Seungwon
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2017, E100A (05) : 1188 - 1196
[6] Optimizing Irregular Shared-Memory Applications for Clusters
Min, Seung-Jai
Eigenmann, Rudolf
ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2008, : 256 - 265
[7] Optimizing the use of GPU Memory in Applications with Large data sets
Satish, Nadathur
Sundaram, Narayanan
Keutzer, Kurt
16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 408 - 418
[8] MAPA: An Automatic Memory Access Pattern Analyzer for GPU Applications
Jo, Gangwon
Jung, Jaehoon
Park, Jiyoung
Lee, Jaejin
ACM SIGPLAN NOTICES, 2017, 52 (08) : 443 - 444
[9] Optimizing I/O for irregular applications on distributed-memory machines
Carretero, J
No, J
Choudhary, A
PARALLEL COMPUTATION, 1999, 1557 : 470 - 479
[10] Cascaded DMA Controller for Speedup of Indirect Memory Access in Irregular Applications
Kashimata, Tomoya
Kitamura, Toshiaki
Kimura, Keiji
Kasahara, Hironori
2019 IEEE/ACM 9TH WORKSHOP ON IRREGULAR APPLICATIONS - ARCHITECTURES AND ALGORITHMS (IA3), 2019, : 71 - 76

← 1 2 3 4 →