QTTNet: Quantized tensor train neural networks for 3D object and video recognition

被引：16

作者：

Lee, Donghyun ^{[1
,2
]}

Wang, Dingheng ^{[3
]}

Yang, Yukuan ^{[1
,2
]}

Deng, Lei ^{[4
]}

Zhao, Guangshe ^{[3
]}

Li, Guoqi ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrumentat, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Beijing Innovat Ctr Future Chip, Beijing 100084, Peoples R China

[3] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Sch Automat Sci & Engn, Xian 710049, Shaanxi, Peoples R China

[4] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

来源：

NEURAL NETWORKS | 2021年 / 141卷

基金：

美国国家科学基金会; 国家重点研发计划;

关键词：

3DCNN; Tensor train decomposition; Neural network compression; Quantization; 8 bit inference; MOTION; ROBUST;

D O I：

10.1016/j.neunet.2021.05.034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Relying on the rapidly increasing capacity of computing clusters and hardware, convolutional neural networks (CNNs) have been successfully applied in various fields and achieved state-of-the-art results. Despite these exciting developments, the huge memory cost is still involved in training and inferring a large-scale CNN model and makes it hard to be widely used in resource-limited portable devices. To address this problem, we establish a training framework for three-dimensional convolutional neural networks (3DCNNs) named QTTNet that combines tensor train (TT) decomposition and data quantization together for further shrinking the model size and decreasing the memory and time cost. Through this framework, we can fully explore the superiority of TT in reducing the number of trainable parameters and the advantage of quantization in decreasing the bit-width of data, particularly compressing 3DCNN model greatly with little accuracy degradation. In addition, due to the low bit quantization to all parameters during the inference process including TT-cores, activations, and batch normalizations, the proposed method naturally takes advantage in memory and time cost. Experimental results of compressing 3DCNNs for 3D object and video recognition on ModelNet40, UCF11, and UCF50 datasets verify the effectiveness of the proposed method. The best compression ratio we have obtained is up to nearly 180x with competitive performance compared with other state-of-the-art researches. Moreover, the total bytes of our QTTNet models on ModelNet40 and UCF11 datasets can be 1000x lower than some typical practices such as MVCNN. (C) 2021 Published by Elsevier Ltd.

引用

页码：420 / 432

页数：13

共 50 条

[21] 3D object recognition based on pairwise Multi-view Convolutional Neural Networks
Gao, Z.
Wang, D. Y.
Xue, Y. B.
Xu, G. P.
Zhang, H.
Wang, Y. L.
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 56 : 305 - 315
[22] 3D object recognition in range images using hidden Markov models and neural networks
Ham, YK
Park, RH
[J]. PATTERN RECOGNITION, 1999, 32 (05) : 729 - 742
[23] 3D convolutional neural network for object recognition: a review
Singh, Rahul Dev
Mittal, Ajay
Bhatia, Rajesh K.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 15951 - 15995
[24] 3D video analysis of the novel object recognition test in rats
Matsumoto, Jumpei
Uehara, Takashi
Urakawa, Susumu
Takamura, Yusaku
Sumiyoshi, Tomiki
Suzuki, Michio
Ono, Taketoshi
Nishijo, Hisao
[J]. BEHAVIOURAL BRAIN RESEARCH, 2014, 272 : 16 - 24
[25] D3D: Distilled 3D Networks for Video Action Recognition
Stroud, Jonathan C.
Ross, David A.
Sun, Chen
Deng, Jia
Sukthankar, Rahul
[J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 614 - 623
[26] An improved recurrent neural networks for 3d object reconstruction
Tingsong Ma
Ping Kuang
Wenhong Tian
[J]. Applied Intelligence, 2020, 50 : 905 - 923
[27] An improved recurrent neural networks for 3d object reconstruction
Ma, Tingsong
Kuang, Ping
Tian, Wenhong
[J]. APPLIED INTELLIGENCE, 2020, 50 (03) : 905 - 923
[28] Hypergraph wavelet neural networks for 3D object classification
Nong, Liping
Wang, Junyi
Lin, Jiming
Qiu, Hongbing
Zheng, Lin
Zhang, Wenhui
[J]. NEUROCOMPUTING, 2021, 463 : 580 - 595
[29] Video Steganography Using 3D Convolutional Neural Networks
Abdolmohammadi, Mahdi
Toroghi, Rahil Mahdian
Bastanfard, Azam
[J]. PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 1144 : 149 - 161
[30] 3D Object Recognition Using Convolutional Neural Networks with Transfer Learning Between Input Channels
Alexandre, Luis A.
[J]. INTELLIGENT AUTONOMOUS SYSTEMS 13, 2016, 302 : 888 - 897

← 1 2 3 4 5 →