CIRCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices

被引:171
|
作者
Ding, Caiwen [1 ]
Liao, Siyu [2 ]
Wang, Yanzhi [1 ]
Li, Zhe [1 ]
Liu, Ning [1 ]
Zhuo, Youwei [3 ]
Wang, Chao [3 ]
Qian, Xuehai [3 ]
Bai, Yu [4 ]
Yuan, Geng [1 ]
Ma, Xiaolong [1 ]
Zhang, Yipeng [1 ]
Tang, Jian [1 ]
Qiu, Qinru [1 ]
Lin, Xue [5 ]
Yuan, Bo [2 ]
机构
[1] Syracuse Univ, Syracuse, NY 13244 USA
[2] CUNY City Coll, New York, NY 10031 USA
[3] Univ Southern Calif, Los Angeles, CA USA
[4] Calif State Univ Fullerton, Fullerton, CA 92634 USA
[5] Northeastern Univ, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Deep learning; block-circulant matrix; compression; acceleration; FPGA; FFT; ARCHITECTURES; DESIGN;
D O I
10.1145/3123939.3124552
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n(2)) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.
引用
收藏
页码:395 / 408
页数:14
相关论文
共 50 条
  • [1] Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights
    Qin, Zidi
    Zhu, Di
    Zhu, Xingwei
    Chen, Xuan
    Shi, Yinghuan
    Gao, Yang
    Lu, Zhonghai
    Shen, Qinghong
    Li, Li
    Pan, Hongbing
    [J]. ELECTRONICS, 2019, 8 (01)
  • [2] REBOC: Accelerating Block-Circulant Neural Networks in ReRAM
    Wang, Yitu
    Chen, Fan
    Song, Linghao
    Shi, C-J Richard
    Hai ''Helen'' Li
    Chen, Yiran
    [J]. PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1472 - 1477
  • [3] BlockGNN: Towards Efficient GNN Acceleration Using Block-Circulant Weight Matrices
    Zhou, Zhe
    Shi, Bizhao
    Zhang, Zhe
    Guan, Yijin
    Sun, Guangyu
    Luo, Guojie
    [J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 1009 - 1014
  • [4] Block-circulant complex Hadamard matrices
    Bruzda, W.
    [J]. JOURNAL OF MATHEMATICAL PHYSICS, 2023, 64 (05)
  • [5] Exploring GPU acceleration of Deep Neural Networks using Block Circulant Matrices
    Dong, Shi
    Zhao, Pu
    Lin, Xue
    Kaeli, David
    [J]. PARALLEL COMPUTING, 2020, 100 (100)
  • [6] Real block-circulant matrices and DCT-DST algorithm for transformer neural network
    Asriani, Euis
    Muchtadi-Alamsyah, Intan
    Purwarianti, Ayu
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2023, 9
  • [7] Energy-Efficient, High-Performance, Highly-Compressed Deep Neural Network Design using Block-Circulant Matrices
    Liao, Siyu
    Li, Zhe
    Lin, Xue
    Qiu, Qinru
    Wang, Yanzhi
    Yuan, Bo
    [J]. 2017 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2017, : 458 - 465
  • [8] Block-circulant matrices for constructing optimal Latin hypercube designs
    Georgiou, S. D.
    Stylianou, S.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (05) : 1933 - 1943
  • [9] An efficient block-circulant preconditioner for simulating fracture using large fuse networks
    Nukala, PKVV
    Simunovic, S
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 2004, 37 (06): : 2093 - 2103
  • [10] A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks
    Ma, Yuzhe
    Chen, Ran
    Li, Wei
    Shang, Fanhua
    Yu, Wenjian
    Cho, Minsik
    Yu, Bei
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 376 - 383