Optimizing non-coalesced memory access for irregular applications with GPU computing

被引:0
|
作者
Ran Zheng
Yuan-dong Liu
Hai Jin
机构
[1] Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System
[2] Huazhong University of Science and Technology,Services Computing Technology and System Lab
[3] Huazhong University of Science and Technology,Cluster and Grid Computing Lab
[4] Huazhong University of Science and Technology,School of Computer Science and Technology
关键词
General purpose graphics processing units; Memory coalescing; Non-coalesced memory access; Data reordering; TP319;
D O I
暂无
中图分类号
学科分类号
摘要
General purpose graphics processing units (GPGPUs) can be used to improve computing performance considerably for regular applications. However, irregular memory access exists in many applications, and the benefits of graphics processing units (GPUs) are less substantial for irregular applications. In recent years, several studies have presented some solutions to remove static irregular memory access. However, eliminating dynamic irregular memory access with software remains a serious challenge. A pure software solution without hardware extensions or offline profiling is proposed to eliminate dynamic irregular memory access, especially for indirect memory access. Data reordering and index redirection are suggested to reduce the number of memory transactions, thereby improving the performance of GPU kernels. To improve the efficiency of data reordering, an operation to reorder data is offloaded to a GPU to reduce overhead and thus transfer data. Through concurrently executing the compute unified device architecture (CUDA) streams of data reordering and the data processing kernel, the overhead of data reordering can be reduced. After these optimizations, the volume of memory transactions can be reduced by 16.7%–50% compared with CUSPARSE-based benchmarks, and the performance of irregular kernels can be improved by 9.64%–34.9% using an NVIDIA Tesla P4 GPU.
引用
收藏
页码:1285 / 1301
页数:16
相关论文
共 35 条
  • [21] Magnetite-Polyaniline Nanocomposite for Non-Volatile Memory and Neuromorphic Computing Applications
    Shah, Ishika U.
    Patil, Snehal L.
    Jadhav, Sushilkumar A.
    Dongale, Tukaram D.
    Kamat, Rajanish K.
    ELECTRONIC MATERIALS LETTERS, 2024, 20 (04) : 381 - 392
  • [22] Simulation of a Fully Digital Computing-in-Memory for Non-Volatile Memory for Artificial Intelligence Edge Applications
    Hu, Hongyang
    Feng, Chuancai
    Zhou, Haiyang
    Dong, Danian
    Pan, Xiaoshan
    Wang, Xiwei
    Zhang, Lu
    Cheng, Shuaiqi
    Pang, Wan
    Liu, Jing
    MICROMACHINES, 2023, 14 (06)
  • [23] A Simple Performance Model for Multithreaded Applications Executing on Non-Uniform Memory Access Computers
    Yang, R.
    Antony, J.
    Rendell, A. P.
    HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 79 - 86
  • [24] Optimizing hardware-software co-design based on non-ideality in memristor crossbars for in-memory computing
    Pinfeng JIANG
    Danzhe SONG
    Menghua HUANG
    Fan YANG
    Letian WANG
    Pan LIU
    Xiangshui MIAO
    Xingsheng WANG
    Science China(Information Sciences), 2025, 68 (02) : 354 - 369
  • [25] Optimizing hardware-software co-design based on non-ideality in memristor crossbars for in-memory computing
    Jiang, Pinfeng
    Song, Danzhe
    Huang, Menghua
    Yang, Fan
    Wang, Letian
    Liu, Pan
    Miao, Xiangshui
    Wang, Xingsheng
    SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (02)
  • [26] A 4Mbit non-volatile Chalcogenide-Random Access Memory designed for space applications
    Li, Bin
    Buingamer, Adam
    Pirkl, Daniel
    Stobie, James
    Neiderer, Wayne
    Graziano, Michael
    Burcin, Laura
    Storey, Thomas
    Orlowsky, Brian
    Hunt, Kenneth K.
    Rodgers, John
    Maimon, Jonathan
    7TH ANNUAL NON-VOLATILE MEMORY TECHNOLOGY SYMPOSIUM, 2006, : 61 - +
  • [27] Optimizing Data Parallelism for FM-Based Short-Read Alignment on the Heterogeneous Non-Uniform Memory Access Architectures
    Chen, Shaolong
    Dai, Yunzi
    Liu, Liwei
    Yu, Xinting
    FUTURE INTERNET, 2024, 16 (06)
  • [28] Nymphaea Alba for Resistive Switching Devices: Exploring the Non-Volatile Memory and Neuromorphic Computing Applications of the Plant Leaves
    Babar, Pooja
    Patil, Pradnya
    Patil, Amitkumar R.
    Jadhav, Bhavana
    Ghadage, Rhushikesh
    Kondalwade, Meghana
    Attar, Suraj
    Kamat, Rajanish K.
    Dongale, Tukaram D.
    Kamble, Santosh
    CHEMISTRYSELECT, 2024, 9 (17):
  • [29] Lessons learned from porting vector computer applications onto non-uniform memory access scalar machines
    Hatazaki, T
    SEVENTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND GRID IN ASIA PACIFIC REGION, PROCEEDINGS, 2004, : 236 - 243
  • [30] Synergetic engineering of oxidizable, redox, and inert metal decorated copper oxide for non-volatile memory and neuromorphic computing applications
    Kundale, Somnath S.
    Patil, Swapnil R.
    Chavan, Vijay D.
    Ustad, Ruhan E.
    Bagade, Amit A.
    Patil, Suvarna M.
    Waifalkar, Pradyumna P.
    Kim, Deok-kee
    Bae, Jinho
    Dongale, Tukaram D.
    SEMICONDUCTOR SCIENCE AND TECHNOLOGY, 2024, 39 (11)