Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

被引：107

作者：

Rhu, Minsoo ^{[1
]}

O'Connor, Mike ^{[2
]}

Chatterjee, Niladrish ^{[2
]}

Pool, Jeff ^{[2
]}

Kwon, Youngeun ^{[1
]}

Keckler, Stephen W. ^{[2
]}

机构：

[1] POSTECH, Pohang, South Korea

[2] NVIDIA, Santa Clara, CA USA

来源：

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2018年

关键词：

D O I：

10.1109/HPCA.2018.00017

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant performance overheads when the time needed to copy data back and forth from CPU memory is higher than the latency to perform DNN computations. We introduce a high-performance virtualization strategy based on a "compressing DMA engine" (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations. The cDMA engine offers an average 2.6x (maximum 13.8x) compression ratio by exploiting the sparsity inherent in offloaded data, improving the performance of virtualized DNNs by an average 53% (maximum 79%) when evaluated on an NVIDIA Titan Xp.

引用

页码：78 / 91

页数：14

共 50 条

[21] Compressing deep-quaternion neural networks with targeted regularisation
Vecchi, Riccardo
Scardapane, Simone
Comminiello, Danilo
Uncini, Aurelio
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2020, 5 (03) : 172 - 176
[22] TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training
Mahmoud, Mostafa
Edo, Isak
Zadeh, Ali Hadi
Awad, Omar Mohamed
Pekhimenko, Gennady
Albericio, Jorge
Moshovos, Andreas
2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 781 - 795
[23] Sparsity-Aware Caches to Accelerate Deep Neural Networks
Ganesan, Vinod
Sen, Sanchari
Kumar, Pratyush
Gala, Neel
Veezhinathan, Kamakoti
Raghunathan, Anand
PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 85 - 90
[24] Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
Xue, Anton
Lindemann, Lars
Robey, Alexander
Hassani, Hamed
Pappas, George J.
Alur, Rajeev
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 3389 - 3396
[25] POSTER: Exploiting the Input Sparsity to Accelerate Deep Neural Networks
Dong, Xiao
Liu, Lei
Li, Guangli
Li, Jiansong
Zhao, Peng
Wang, Xueying
Feng, Xiaobing
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 401 - 402
[26] Variance-Guided Structured Sparsity in Deep Neural Networks
Pandit M.K.
Banday M.
IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1714 - 1723
[27] Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity
Dong, Xiao
Liu, Lei
Zhao, Peng
Li, Guangli
Li, Jiansong
Wang, Xueying
Feng, Xiaobing
2019 28TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2019), 2019, : 178 - 191
[28] Sparsity-aware generalization theory for deep neural networks
Muthukumar, Ramchandran
Sulam, Jeremias
THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
[29] Small Is Beautiful: Compressing Deep Neural Networks for Partial Domain Adaptation
Ma, Yuzhe
Yao, Xufeng
Chen, Ran
Li, Ruiyu
Shen, Xiaoyong
Yu, Bei
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3575 - 3585
[30] Compressing Deep Neural Networks using a Rank-Constrained Topology
Nakkiran, Preetum
Alvarez, Raziel
Prabhavalkar, Rohit
Parada, Carolina
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1473 - 1477

← 1 2 3 4 5 →