Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores

被引:14
|
作者
Kim, Hyeonjin [1 ]
Ahn, Sungwoo [1 ]
Oh, Yunho [2 ]
Kim, Bogil [1 ]
Ro, Won Woo [1 ]
Song, William J. [1 ]
机构
[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea
[2] Ecole Polytech Fed Lausanne EPFL, EcoCloud, Lausanne, Vaud, Switzerland
关键词
Deep Neural Network; GPU; Tensor Core;
D O I
10.1109/MICRO50266.2020.00065
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a GPU architecture named Duplo that minimizes redundant memory accesses of convolutions in deep neural networks (DNNs). Convolution is one of the fundamental operations used in various classes of DNNs, and it takes the majority of execution time. Various approaches have been proposed to accelerate convolutions via general matrix multiplication (GEMM), Winograd convolution, fast Fourier transform (FFT), etc. Recent introduction of tensor cores in NVIDIA GPUs particularly targets on accelerating neural network computations. A tensor core in a streaming multiprocessor (SM) is a specialized unit dedicated to handling matrix-multiply-and-accumulate (MMA) operations. The underlying operations of tensor cores represent GEMM calculations, and lowering a convolution can effectively exploit the tensor cores by transforming deeply nested convolution loops into matrix multiplication. However, lowering the convolution has a critical drawback since it requires a larger memory space (or workspace) to compute the matrix multiplication, where the expanded workspace inevitably creates multiple duplicates of the same data stored at different memory addresses. The proposed Duplo architecture tackles this challenge by leveraging compile-time information and microarchitectural supports to detect and eliminate redundant memory accesses that repeatedly load the duplicates of data in the workspace matrix. Duplo identifies data duplication based on memory addresses and convolution information generated by a compiler. It uses a load history buffer (LHB) to trace the recent load history of workspace data and their presence in register file. Every load instruction of workspace data refers to the LHB to find if potentially the same copies of data exist in the register file. If data duplicates are found, Duplo simply renames registers and makes them point to the ones containing the same values instead of issuing memory requests to load the same data. Our experiment results show that Duplo improves the performance of DNNs by 29.4% on average and saves 34.1% of energy using tensor cores.
引用
收藏
页码:725 / 737
页数:13
相关论文
共 50 条
  • [21] Learning structured and non-redundant representations with deep neural networks
    Yang, Jihai
    Xiong, Wei
    Li, Shijun
    Xu, Chang
    PATTERN RECOGNITION, 2019, 86 : 224 - 235
  • [22] Prediction of molecular energy using deep tensor neural networks
    Li, Yan
    Min, Han-Yi
    Dong, Zi-Bing
    Yuan, Tian
    Li, Xiao-Qi
    Xu, Pei-Jun
    Li, Guo-Hui
    COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2018, 18 (04) : 229 - 250
  • [23] Detect Adversarial Attacks Against Deep Neural Networks With GPU Monitoring
    Zoppi, Tommaso
    Ceccarelli, Andrea
    IEEE ACCESS, 2021, 9 : 150579 - 150591
  • [24] A Scalable GPU-enabled Framework for Training Deep Neural Networks
    Del Monte, Bonaventura
    Prodan, Radu
    2016 2ND INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2016,
  • [25] A GPU Implementation Method of Deep Neural Networks Based on Data Swapping
    Fukushi, Masaru
    Kanbara, Yuta
    2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2019,
  • [26] Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs
    Li, Ang
    Su, Simon
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (07) : 1878 - 1891
  • [27] pommDNN: Performance optimal GPU memory management for deep neural network training
    Chen, Weiduo
    Dong, Xiaoshe
    Chen, Xinhang
    Liu, Song
    Xia, Qin
    Wang, Qiang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 152 : 160 - 169
  • [28] Tactile Gloves Predict Load Weight During Lifting With Deep Neural Networks
    Zhou, Guoyang
    Lu, Ming-Lun
    Yu, Denny
    IEEE SENSORS JOURNAL, 2023, 23 (16) : 18798 - 18809
  • [29] Deep Recurrent Neural Networks Based Obstacle Avoidance Control for Redundant Manipulators
    Xu, Zhihao
    Zhou, Xuefeng
    Li, Shuai
    FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [30] Accelerating Deep Neural Networks with Analog Memory Devices
    Ambrogio, Stefano
    Narayanan, Pritish
    Tsai, Hsinyu
    Mackin, Charles
    Spoon, Katherine
    Chen, An
    Fasoli, Andrea
    Friz, Alexander
    Burr, Geoffrey W.
    2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 149 - 152