Exploiting Direct Memory Operands in GPU Instructions

被引:0
|
作者
Mohammadpur-Fard, Ali [1 ]
Darabi, Sina [2 ,3 ]
Falahati, Hajar [2 ,4 ]
Mahani, Negin [2 ,4 ,5 ]
Sarbazi-Azad, Hamid [1 ,2 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran 111559466, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran 195383351, Iran
[3] Univ Svizzera Italiana USI, Fac Informat, CH-6900 Lugano, Switzerland
[4] Barcelona Supercomp Ctr BSC, Barcelona 08034, Spain
[5] Shahid Bahonar Univ, Dept Comp Engn, Higher Educ Complex Zarand, Kerman 761691411, Iran
关键词
Registers; Graphics processing units; Computer architecture; Reduced instruction set computing; Arithmetic; Hardware; Standards; CISC; GPGPU; RISC; register file;
D O I
10.1109/LCA.2024.3371062
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs are widely used for diverse applications, particularly data-parallel tasks like machine learning and scientific computing. However, their efficiency is hindered by architectural limitations, inherited from historical RISC processors, in handling memory loads causing high register file contention. We observe that a significant number (around 26%) of values present in the register file are typically used only once, contributing to more than 25% of the total register file bank conflicts, on average. This paper addresses the challenge of single-use memory values in the GPU register file (i.e. data values used only once) which wastes space and increases latency. To this end, we introduce a novel mechanism inspired by CISC architectures. It replaces single-use loads with direct memory operands in arithmetic operations. Our approach improves performance by 20% and reduces energy consumption by 18%, on average, with negligible (<1%) hardware overhead.
引用
收藏
页码:162 / 165
页数:4
相关论文
共 50 条
  • [31] Fast Longest Prefix Matching by Exploiting SIMD Instructions
    Ueno, Yukito
    Nakamura, Ryo
    Kuga, Yohei
    Esaki, Hiroshi
    IEEE ACCESS, 2020, 8 : 167027 - 167041
  • [32] gShare: A centralized GPU memory management framework to enable GPU memory sharing for containers
    Lee, Munkyu
    Ahn, Hyunho
    Hong, Cheol-Ho
    Nikolopoulos, Dimitrios S.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 130 : 181 - 192
  • [33] Animal memory: The role of ''instructions''
    Zentall, TR
    LEARNING AND MOTIVATION, 1997, 28 (02) : 280 - 308
  • [34] SCHEDULING INSTRUCTIONS BY DIRECT PLACEMENT
    GRIESEMER, R
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 641 : 229 - 235
  • [35] Direct point rendering on GPU
    Kawata, H
    Kanai, T
    ADVANCES IN VISUAL COMPUTING, PROCEEDINGS, 2005, 3804 : 587 - 594
  • [36] ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming
    Kim, Youngsok
    Lee, Jaewon
    Kim, Donggyu
    Kim, Jangwoo
    IEEE COMPUTER ARCHITECTURE LETTERS, 2014, 13 (02) : 101 - 104
  • [37] MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters
    Li, Baolin
    Patel, Tirthak
    Samsi, Siddharth
    Gadepally, Vijay
    Tiwari, Devesh
    PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 173 - 189
  • [38] EFFECTS OF IMPLICIT OR EXPLICIT MEMORY INSTRUCTIONS ON BILINGUAL MEMORY
    DURGUNOGLU, A
    GARCIA, G
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1989, 27 (06) : 519 - 519
  • [39] EFFECTS OF MEMORY INSTRUCTIONS ON CHILDRENS COMPREHENSION AND MEMORY OF TELEVISION
    KNOWLES, A
    AUSTRALIAN PSYCHOLOGIST, 1979, 14 (02) : 187 - 187
  • [40] Derivation of packing instructions for exploiting sub-word parallelism
    Schaffer, Rainer
    Merker, Renate
    Catthoor, Francky
    PAR ELEC 2006: INTERNATIONAL SYMPOSIUM ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, PROCEEDINGS, 2006, : 167 - +