QTTNet: Quantized tensor train neural networks for 3D object and video recognition

被引:16
|
作者
Lee, Donghyun [1 ,2 ]
Wang, Dingheng [3 ]
Yang, Yukuan [1 ,2 ]
Deng, Lei [4 ]
Zhao, Guangshe [3 ]
Li, Guoqi [1 ,2 ]
机构
[1] Tsinghua Univ, Ctr Brain Inspired Comp Res, Dept Precis Instrumentat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Beijing Innovat Ctr Future Chip, Beijing 100084, Peoples R China
[3] Xi An Jiao Tong Univ, Fac Elect & Informat Engn, Sch Automat Sci & Engn, Xian 710049, Shaanxi, Peoples R China
[4] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
基金
美国国家科学基金会; 国家重点研发计划;
关键词
3DCNN; Tensor train decomposition; Neural network compression; Quantization; 8 bit inference; MOTION; ROBUST;
D O I
10.1016/j.neunet.2021.05.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Relying on the rapidly increasing capacity of computing clusters and hardware, convolutional neural networks (CNNs) have been successfully applied in various fields and achieved state-of-the-art results. Despite these exciting developments, the huge memory cost is still involved in training and inferring a large-scale CNN model and makes it hard to be widely used in resource-limited portable devices. To address this problem, we establish a training framework for three-dimensional convolutional neural networks (3DCNNs) named QTTNet that combines tensor train (TT) decomposition and data quantization together for further shrinking the model size and decreasing the memory and time cost. Through this framework, we can fully explore the superiority of TT in reducing the number of trainable parameters and the advantage of quantization in decreasing the bit-width of data, particularly compressing 3DCNN model greatly with little accuracy degradation. In addition, due to the low bit quantization to all parameters during the inference process including TT-cores, activations, and batch normalizations, the proposed method naturally takes advantage in memory and time cost. Experimental results of compressing 3DCNNs for 3D object and video recognition on ModelNet40, UCF11, and UCF50 datasets verify the effectiveness of the proposed method. The best compression ratio we have obtained is up to nearly 180x with competitive performance compared with other state-of-the-art researches. Moreover, the total bytes of our QTTNet models on ModelNet40 and UCF11 datasets can be 1000x lower than some typical practices such as MVCNN. (C) 2021 Published by Elsevier Ltd.
引用
收藏
页码:420 / 432
页数:13
相关论文
共 50 条
  • [41] 3D Convolutional Neural Networks for Human Action Recognition
    Ji, Shuiwang
    Xu, Wei
    Yang, Ming
    Yu, Kai
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) : 221 - 231
  • [42] 3D Objects Recognition Using Artificial Neural Networks
    Ortiz Correa, Diogo Santos
    Osorio, Fernando Santos
    [J]. 2018 XLIV LATIN AMERICAN COMPUTER CONFERENCE (CLEI 2018), 2018, : 288 - 293
  • [43] Hand Gesture Recognition with 3D Convolutional Neural Networks
    Molchanov, Pavlo
    Gupta, Shalini
    Kim, Kihwan
    Kautz, Jan
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2015,
  • [44] Asymmetric 3D Convolutional Neural Networks for action recognition
    Yang, Hao
    Yuan, Chunfeng
    Li, Bing
    Du, Yang
    Xing, Junliang
    Hu, Weiming
    Maybank, Stephen J.
    [J]. PATTERN RECOGNITION, 2019, 85 : 1 - 12
  • [45] 3D Local Convolutional Neural Networks for Gait Recognition
    Huang, Zhen
    Xue, Dixiu
    Shen, Xu
    Tian, Xinmei
    Li, Houqiang
    Huang, Jianqiang
    Hua, Xian-Sheng
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14900 - 14909
  • [46] Neural networks for appearance-based 3-D object recognition
    Yuan, C
    Niemann, H
    [J]. NEUROCOMPUTING, 2003, 51 : 249 - 264
  • [47] Binary Volumetric Convolutional Neural Networks for 3-D Object Recognition
    Ma, Chao
    Guo, Yulan
    Lei, Yinjie
    An, Wei
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2019, 68 (01) : 38 - 48
  • [48] 3D object indexing and recognition
    Aouat, Saliha
    Laiche, Nacera
    Souami, Feryel
    Larabi, Slimane
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (01) : 318 - 332
  • [49] Probabilistic 3D object recognition
    Shimshoni, I
    Ponce, J
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2000, 36 (01) : 51 - 70
  • [50] A new neural networks based 3-D object recognition system
    Abolmaesumi, P
    Jahed, M
    [J]. INTELLIGENT ROBOTS AND COMPUTER VISION XVI: ALGORITHMS, TECHNIQUES, ACTIVE VISION, AND MATERIALS HANDLING, 1997, 3208 : 273 - 282