Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

被引：107

作者：

Rhu, Minsoo ^{[1
]}

O'Connor, Mike ^{[2
]}

Chatterjee, Niladrish ^{[2
]}

Pool, Jeff ^{[2
]}

Kwon, Youngeun ^{[1
]}

Keckler, Stephen W. ^{[2
]}

机构：

[1] POSTECH, Pohang, South Korea

[2] NVIDIA, Santa Clara, CA USA

来源：

2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2018年

关键词：

D O I：

10.1109/HPCA.2018.00017

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant performance overheads when the time needed to copy data back and forth from CPU memory is higher than the latency to perform DNN computations. We introduce a high-performance virtualization strategy based on a "compressing DMA engine" (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations. The cDMA engine offers an average 2.6x (maximum 13.8x) compression ratio by exploiting the sparsity inherent in offloaded data, improving the performance of virtualized DNNs by an average 53% (maximum 79%) when evaluated on an NVIDIA Titan Xp.

引用

页码：78 / 91

页数：14

共 50 条

[41] Training deep quantum neural networks
Kerstin Beer
Dmytro Bondarenko
Terry Farrelly
Tobias J. Osborne
Robert Salzmann
Daniel Scheiermann
Ramona Wolf
Nature Communications, 11
[42] NOISY TRAINING FOR DEEP NEURAL NETWORKS
Meng, Xiangtao
Liu, Chao
Zhang, Zhiyong
Wang, Dong
2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP), 2014, : 16 - 20
[43] Activation Ensembles for Deep Neural Networks
Klabjan, Diego
Harmon, Mark
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 206 - 214
[44] SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks
Yin, Ruokai
Moitra, Abhishek
Bhattacharjee, Abhiroop
Kim, Youngeun
Panda, Priyadarshini
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (06) : 1926 - 1938
[45] Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Singh, Siddharth
Bhatele, Abhinav
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 245 - 255
[46] SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training
Dai, Pengcheng
Yang, Jianlei
Ye, Xucheng
Cheng, Xingzhou
Luo, Junyu
Song, Linghao
Chen, Yiran
Zhao, Weisheng
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[47] Feature flow regularization: Improving structured sparsity in deep neural networks
Wu, Yue
Lan, Yuan
Zhang, Luchan
Xiang, Yang
NEURAL NETWORKS, 2023, 161 : 598 - 613
[48] JOINT OPTIMIZATION OF QUANTIZATION AND STRUCTURED SPARSITY FOR COMPRESSED DEEP NEURAL NETWORKS
Srivastava, Gaurav
Kadetotad, Deepak
Yin, Shihui
Berisha, Visar
Chakrabarti, Chaitali
Seo, Jae-sun
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1393 - 1397
[49] Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks
Chunshan Li
Qing Du
Xiaofei Xu
Jinhui Zhu
Dianhui Chu
Mobile Networks and Applications, 2021, 26 : 104 - 113
[50] Sparsity Turns Adversarial: Energy and Latency Attacks on Deep Neural Networks
Krithivasan, Sarada
Sen, Sanchari
Raghunathan, Anand
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (11) : 4129 - 4141

← 1 2 3 4 5 →