Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores

被引:14
|
作者
Kim, Hyeonjin [1 ]
Ahn, Sungwoo [1 ]
Oh, Yunho [2 ]
Kim, Bogil [1 ]
Ro, Won Woo [1 ]
Song, William J. [1 ]
机构
[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea
[2] Ecole Polytech Fed Lausanne EPFL, EcoCloud, Lausanne, Vaud, Switzerland
关键词
Deep Neural Network; GPU; Tensor Core;
D O I
10.1109/MICRO50266.2020.00065
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a GPU architecture named Duplo that minimizes redundant memory accesses of convolutions in deep neural networks (DNNs). Convolution is one of the fundamental operations used in various classes of DNNs, and it takes the majority of execution time. Various approaches have been proposed to accelerate convolutions via general matrix multiplication (GEMM), Winograd convolution, fast Fourier transform (FFT), etc. Recent introduction of tensor cores in NVIDIA GPUs particularly targets on accelerating neural network computations. A tensor core in a streaming multiprocessor (SM) is a specialized unit dedicated to handling matrix-multiply-and-accumulate (MMA) operations. The underlying operations of tensor cores represent GEMM calculations, and lowering a convolution can effectively exploit the tensor cores by transforming deeply nested convolution loops into matrix multiplication. However, lowering the convolution has a critical drawback since it requires a larger memory space (or workspace) to compute the matrix multiplication, where the expanded workspace inevitably creates multiple duplicates of the same data stored at different memory addresses. The proposed Duplo architecture tackles this challenge by leveraging compile-time information and microarchitectural supports to detect and eliminate redundant memory accesses that repeatedly load the duplicates of data in the workspace matrix. Duplo identifies data duplication based on memory addresses and convolution information generated by a compiler. It uses a load history buffer (LHB) to trace the recent load history of workspace data and their presence in register file. Every load instruction of workspace data refers to the LHB to find if potentially the same copies of data exist in the register file. If data duplicates are found, Duplo simply renames registers and makes them point to the ones containing the same values instead of issuing memory requests to load the same data. Our experiment results show that Duplo improves the performance of DNNs by 29.4% on average and saves 34.1% of energy using tensor cores.
引用
收藏
页码:725 / 737
页数:13
相关论文
共 50 条
  • [41] Comparison of Deep Learning in Neural Networks on CPU and GPU-based frameworks
    Aida-zade, Kamil
    Mustafayev, Elshan
    Rustamov, Samir
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT 2017), 2017, : 95 - 98
  • [42] An MRAM-based Deep In-Memory Architecture for Deep Neural Networks
    Patil, Ameya D.
    Hua, Haocheng
    Gonugondla, Sujan
    Kang, Mingu
    Shanbhag, Naresh R.
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [43] ooc_cuDNN: Accommodating Convolutional Neural Networks over GPU Memory Capacity
    Ito, Yuki
    Matsumiya, Ryo
    Endo, Toshio
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 183 - 192
  • [44] A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection
    Li, Shijie
    Dou, Yong
    Niu, Xin
    Lv, Qi
    Wang, Qiang
    NEUROCOMPUTING, 2017, 230 : 48 - 59
  • [45] Large-scale Memory of Sequences using Binary Sparse Neural Networks on GPU
    Marques, Max Raphael Sobroza
    Hacene, Ghouthi Boukli
    Lassance, Carlos Eduardo Rosar Kos
    Horrein, Pierre-Henri
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 553 - 559
  • [46] Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
    Li, Chao
    Yang, Yi
    Feng, Min
    Chakradhar, Srimat
    Zhou, Huiyang
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 633 - 644
  • [47] Saving Memory Space in Deep Neural Networks by Recomputing: A Survey
    Ulidowski, Irek
    REVERSIBLE COMPUTATION, RC 2023, 2023, 13960 : 89 - 105
  • [48] Flash Memory Array for Efficient Implementation of Deep Neural Networks
    Han, Runze
    Xiang, Yachen
    Huang, Peng
    Shan, Yihao
    Liu, Xiaoyan
    Kang, Jinfeng
    ADVANCED INTELLIGENT SYSTEMS, 2021, 3 (05)
  • [49] Dataflow Restructuring for Active Memory Reduction in Deep Neural Networks
    Cipolletta, Antonio
    Calimera, Andrea
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 114 - 119
  • [50] Seismic Moment Tensor Inversion in Anisotropic Media using Deep Neural Networks
    Brunini, Germn, I
    Velis, Danilo R.
    Sabbione, Juan, I
    2021 XIX WORKSHOP ON INFORMATION PROCESSING AND CONTROL (RPIC), 2021,