Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators

被引：29

作者：

Lu, Hang ^{[1
,2
]}

Wei, Xin ^{[2
]}

Lin, Ning ^{[2
]}

Yan, Guihai ^{[1
,2
]}

Li, Xiao-Wei ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS | 2018年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1145/3240765.3240855

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Inference efficiency is the predominant consideration in designing deep learning accelerators. Previous work mainly focuses on skipping zero values to deal with remarkable ineffectual computation, while zero bits in non-zero values, as another major source of ineffectual computation, is often ignored. The reason lies on the difficulty of extracting essential bits during operating multiply-and-accumulate (MAC) in the processing element. Based on the fact that zero bits occupy as high as 68.9% fraction in the overall weights of modern deep convolutional neural network models, this paper firstly proposes a weight kneading technique that could eliminate ineffectual computation caused by either zero value weights or zero bits in non-zero weights, simultaneously. Besides, a split-and-accumulate (SAC) computing pattern in replacement of conventional MAC, as well as the corresponding hardware accelerator design called Tetris are proposed to support weight kneading at the hardware level. Experimental results prove that Tetris could speed up inference up to 1.50x, and improve power efficiency up to 5.33x compared with the state-of-the-art baselines.

引用

页数：8

共 50 条

[31] Forward Learning Convolutional Neural Network
Hu, Hong
Hong, Xin
Hou, Dan Yang
Shi, Zhongzhi
INTELLIGENT INFORMATION PROCESSING IX, 2018, 538 : 51 - 61
[32] Learning Pooling for Convolutional Neural Network
Sun, Manli
Song, Zhanjie
Jiang, Xiaoheng
Pan, Jing
Pang, Yanwei
NEUROCOMPUTING, 2017, 224 : 96 - 104
[33] Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
Cavigelli, Lukas
Benini, Luca
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 279 - 283
[34] Parallel Convolutional Neural Network (CNN) Accelerators Based on Stochastic Computing
Zhang, Yawen
Zhang, Xinyue
Song, Jiahao
Wang, Yuan
Huang, Ru
Wang, Runsheng
PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 19 - 24
[35] CNNWire: Boosting Convolutional Neural Network with Winograd on ReRAM based Accelerators
Lin, Jilan
Li, Shuangchen
Hu, Xing
Deng, Lei
Xie, Yuan
GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 283 - 286
[36] Spatial Data Dependence Graph Simulator for Convolutional Neural Network Accelerators
Wang, Jooho
Kim, Jiwon
Moon, Sungmin
Kim, Sunwoo
Park, Sungkyung
Park, Chester Sungchung
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 309 - 310
[37] Hardware Accelerators for a Convolutional Neural Network in Condition Monitoring of CNC Machines
Hoyer, Ingo
Berg, Oscar
Krupp, Lukas
Utz, Alexander
Wiede, Christian
Seidl, Karsten
2023 IEEE SENSORS, 2023,
[38] A Feature Map Lossless Compression Framework for Convolutional Neural Network Accelerators
Zhang, Zekun
Jiao, Xin
Xu, Chengyu
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 422 - 426
[39] An efficient loop tiling framework for convolutional neural network inference accelerators
Huang, Hongmin
Hu, Xianghong
Li, Xueming
Xiong, Xiaoming
IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123
[40] Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators
Yang, Shaofei
Liu, Longjun
Li, Baoting
Sun, Hongbin
Zheng, Nanning
2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 315 - 319

← 1 2 3 4 5 →