Exploiting Direct Memory Operands in GPU Instructions

被引:0
|
作者
Mohammadpur-Fard, Ali [1 ]
Darabi, Sina [2 ,3 ]
Falahati, Hajar [2 ,4 ]
Mahani, Negin [2 ,4 ,5 ]
Sarbazi-Azad, Hamid [1 ,2 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran 111559466, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran 195383351, Iran
[3] Univ Svizzera Italiana USI, Fac Informat, CH-6900 Lugano, Switzerland
[4] Barcelona Supercomp Ctr BSC, Barcelona 08034, Spain
[5] Shahid Bahonar Univ, Dept Comp Engn, Higher Educ Complex Zarand, Kerman 761691411, Iran
关键词
Registers; Graphics processing units; Computer architecture; Reduced instruction set computing; Arithmetic; Hardware; Standards; CISC; GPGPU; RISC; register file;
D O I
10.1109/LCA.2024.3371062
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs are widely used for diverse applications, particularly data-parallel tasks like machine learning and scientific computing. However, their efficiency is hindered by architectural limitations, inherited from historical RISC processors, in handling memory loads causing high register file contention. We observe that a significant number (around 26%) of values present in the register file are typically used only once, contributing to more than 25% of the total register file bank conflicts, on average. This paper addresses the challenge of single-use memory values in the GPU register file (i.e. data values used only once) which wastes space and increases latency. To this end, we introduce a novel mechanism inspired by CISC architectures. It replaces single-use loads with direct memory operands in arithmetic operations. Our approach improves performance by 20% and reduces energy consumption by 18%, on average, with negligible (<1%) hardware overhead.
引用
收藏
页码:162 / 165
页数:4
相关论文
共 50 条
  • [21] Exploiting Vector Instructions with Generalized Stream Fusion
    Mainland, Geoffrey
    Leshchinskiy, Roman
    Jones, Simon Peyton
    ACM SIGPLAN NOTICES, 2013, 48 (09) : 37 - 48
  • [22] EXPLOITING DIRECT ACCESS SHARED MEMORY FOR MPI ON MULTI-CORE PROCESSORS
    Brightwell, Ron
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2010, 24 (01): : 69 - 77
  • [23] Exploiting GPU for Large Scale Fingerprint Identification
    Hong Hai Le
    Ngoc Hoa Nguyen
    Tri Thanh Nguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 688 - 697
  • [24] Exploiting Core Criticality for Enhanced GPU Performance
    Jog, Adwait
    Kayiran, Onur
    Pattnaik, Ashutosh
    Kandemir, Mahmut T.
    Mutlu, Onur
    Iyer, Ravishankar
    Das, Chita R.
    SIGMETRICS/PERFORMANCE 2016: PROCEEDINGS OF THE SIGMETRICS/PERFORMANCE JOINT INTERNATIONAL CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SCIENCE, 2016, : 351 - 363
  • [25] Exploiting GPU Architectures for Dynamic Invariant Mining
    Bombieri, Nicola
    Busato, Federico
    Danese, Alessandro
    Piccolboni, Luca
    Pravadelli, Graziano
    2015 33RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2015, : 192 - 195
  • [26] FeMIC: Multi-Operands In-Memory Computing Based on FeFETs
    Liu, Rui
    Zhang, Xiaoyu
    Chen, Xiaoming
    Han, Yinhe
    Tang, Minghua
    27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 678 - 683
  • [27] Evaluating the Soft Error Resilience of Instructions for GPU Applications
    Wei, Xiaohui
    Zhang, Ruyu
    Liu, Yuanyuan
    Yue, Hengshan
    Tan, Jingweijia
    2019 22ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (IEEE CSE 2019) AND 17TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (IEEE EUC 2019), 2019, : 459 - 464
  • [28] GPU-FPGA Heterogeneous Computing with OpenCL-enabled Direct Memory Access
    Kobayashi, Ryohei
    Fujita, Norihisa
    Yamaguchi, Yoshiki
    Nakamichi, Ayumi
    Boku, Taisuke
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 489 - 498
  • [29] Packing Narrow-Width Operands to Improve Energy Efficiency of General-Purpose GPU Computing
    Wang, Xin
    Zhang, Wei
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [30] Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes
    Faraji, Iman
    Mirsadeghi, Seyed H.
    Afsahi, Ahmad
    PARALLEL COMPUTING, 2017, 68 : 3 - 16