Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning

被引:0
|
作者
Zhou, Qihua [1 ]
Guo, Song [1 ]
Qu, Zhihao [2 ]
Guo, Jingcai [1 ]
Xu, Zhenda [1 ]
Zhang, Jiewei [1 ]
Guo, Tao [1 ]
Luo, Boyuan [1 ]
Zhou, Jingren [3 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[2] Hohai Univ, Nanjing, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
On-device learning is an emerging technique to pave the last mile of enabling edge intelligence, which eliminates the limitations of conventional in-cloud computing where dozens of computational capacities and memories are needed. A high-performance on-device learning system requires breaking the constraints of limited resources and alleviating computational overhead. In this paper, we show that employing the 8-bit fixed-point (INT8) quantization in both forward and backward passes over a deep model is a promising way to enable tiny on-device learning in practice. The key to an efficient quantization-aware training method is to exploit the hardware-level enabled acceleration while preserving the training quality in each layer. However, off-the-shelf quantization methods cannot handle the on-device learning paradigm of fixed-point processing. To overcome these challenges, we propose a novel INT8 training method, which optimizes the computation of forward and backward passes via the delicately designed Loss-aware Compensation (LAC) and Parameterized Range Clipping (PRC), respectively. Specifically, we build a new network component, the compensation layer, to automatically counteract the quantization error of tensor arithmetic. We implement our method in Octo, a lightweight cross-platform system for tiny on-device learning. Evaluation on commercial AI chips shows that Octo holds higher training efficiency over state-of-the-art quantization training methods, while achieving adequate processing speedup and memory reduction over the full-precision training.
引用
收藏
页码:365 / 380
页数:16
相关论文
共 1 条
  • [1] Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition
    Zhen, Kai
    Nguyen, Hieu Duy
    Chinta, Raviteja
    Susanj, Nathan
    Mouchtaris, Athanasios
    Afzal, Tariq
    Rastrow, Ariya
    [J]. INTERSPEECH 2022, 2022, : 3033 - 3037