Optimizing non-coalesced memory access for irregular applications with GPU computing

被引:0
|
作者
Ran Zheng
Yuan-dong Liu
Hai Jin
机构
[1] Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System
[2] Huazhong University of Science and Technology,Services Computing Technology and System Lab
[3] Huazhong University of Science and Technology,Cluster and Grid Computing Lab
[4] Huazhong University of Science and Technology,School of Computer Science and Technology
关键词
General purpose graphics processing units; Memory coalescing; Non-coalesced memory access; Data reordering; TP319;
D O I
暂无
中图分类号
学科分类号
摘要
General purpose graphics processing units (GPGPUs) can be used to improve computing performance considerably for regular applications. However, irregular memory access exists in many applications, and the benefits of graphics processing units (GPUs) are less substantial for irregular applications. In recent years, several studies have presented some solutions to remove static irregular memory access. However, eliminating dynamic irregular memory access with software remains a serious challenge. A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access, especially for indirect memory access. Data reordering and index redirection are suggested to reduce the number of memory transactions, thereby improving the performance of GPU kernels. To improve the efficiency of data reordering, an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data. Through concurrently executing the compute unified device architecture (CUDA) streams of data reordering and the data processing kernel, the overhead of data reordering can be reduced. After these optimizations, the volume of memory transactions can be reduced by 16.7%–50% compared with CUSPARSE-based benchmarks, and the performance of irregular kernels can be improved by 9.64%–34.9% using an NVIDIA Tesla P4 GPU.
引用
收藏
页码:1285 / 1301
页数:16
相关论文
共 35 条
  • [1] Optimizing non-coalesced memory access for irregular applications with GPU computing
    Zheng, Ran
    Liu, Yuan-dong
    Jin, Hai
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (09) : 1285 - 1301
  • [2] Complexity Analysis and Algorithm Design for Reorganizing Data to Minimize Non-Coalesced Memory Accesses on GPU
    Wu, Bo
    Zhao, Zhijia
    Zhang, Eddy Z.
    Jiang, Yunlian
    Shen, Xipeng
    ACM SIGPLAN NOTICES, 2013, 48 (08) : 57 - 67
  • [3] Non-coalesced Access Patterns of Global Memory Load Transactions in Metropolis Resampling Implemented on Graphics Processing Unit
    Dulger, Ozcan
    Oguztuzun, Halit
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [4] An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns
    Li, Bingchao
    Wei, Jizeng
    Sun, Jizhou
    Annavaram, Murali
    Kim, Nam Sung
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (03) : 1 - 24
  • [5] A Novel Procedure for Implementing a Turbo Decoder on a GPU with Coalesced Memory Access
    Ahn, Heungseop
    Choi, Seungwon
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2017, E100A (05) : 1188 - 1196
  • [6] Optimizing Irregular Shared-Memory Applications for Clusters
    Min, Seung-Jai
    Eigenmann, Rudolf
    ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2008, : 256 - 265
  • [7] Optimizing the use of GPU Memory in Applications with Large data sets
    Satish, Nadathur
    Sundaram, Narayanan
    Keutzer, Kurt
    16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 408 - 418
  • [8] MAPA: An Automatic Memory Access Pattern Analyzer for GPU Applications
    Jo, Gangwon
    Jung, Jaehoon
    Park, Jiyoung
    Lee, Jaejin
    ACM SIGPLAN NOTICES, 2017, 52 (08) : 443 - 444
  • [9] Optimizing I/O for irregular applications on distributed-memory machines
    Carretero, J
    No, J
    Choudhary, A
    PARALLEL COMPUTATION, 1999, 1557 : 470 - 479
  • [10] Cascaded DMA Controller for Speedup of Indirect Memory Access in Irregular Applications
    Kashimata, Tomoya
    Kitamura, Toshiaki
    Kimura, Keiji
    Kasahara, Hironori
    2019 IEEE/ACM 9TH WORKSHOP ON IRREGULAR APPLICATIONS - ARCHITECTURES AND ALGORITHMS (IA3), 2019, : 71 - 76