Octo: INT8 Training with Loss-aware Compensation and Backward Quantization for Tiny On-device Learning

被引：0

作者：

Zhou, Qihua ^{[1
]}

Guo, Song ^{[1
]}

Qu, Zhihao ^{[2
]}

Guo, Jingcai ^{[1
]}

Xu, Zhenda ^{[1
]}

Zhang, Jiewei ^{[1
]}

Guo, Tao ^{[1
]}

Luo, Boyuan ^{[1
]}

Zhou, Jingren ^{[3
]}

机构：

[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[2] Hohai Univ, Nanjing, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 2021 USENIX ANNUAL TECHNICAL CONFERENCE | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

On-device learning is an emerging technique to pave the last mile of enabling edge intelligence, which eliminates the limitations of conventional in-cloud computing where dozens of computational capacities and memories are needed. A high-performance on-device learning system requires breaking the constraints of limited resources and alleviating computational overhead. In this paper, we show that employing the 8-bit fixed-point (INT8) quantization in both forward and backward passes over a deep model is a promising way to enable tiny on-device learning in practice. The key to an efficient quantization-aware training method is to exploit the hardware-level enabled acceleration while preserving the training quality in each layer. However, off-the-shelf quantization methods cannot handle the on-device learning paradigm of fixed-point processing. To overcome these challenges, we propose a novel INT8 training method, which optimizes the computation of forward and backward passes via the delicately designed Loss-aware Compensation (LAC) and Parameterized Range Clipping (PRC), respectively. Specifically, we build a new network component, the compensation layer, to automatically counteract the quantization error of tensor arithmetic. We implement our method in Octo, a lightweight cross-platform system for tiny on-device learning. Evaluation on commercial AI chips shows that Octo holds higher training efficiency over state-of-the-art quantization training methods, while achieving adequate processing speedup and memory reduction over the full-precision training.

引用

页码：365 / 380

页数：16

共 1 条

[1] Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition
Zhen, Kai
Nguyen, Hieu Duy
Chinta, Raviteja
Susanj, Nathan
Mouchtaris, Athanasios
Afzal, Tariq
Rastrow, Ariya
[J]. INTERSPEECH 2022, 2022, : 3033 - 3037

← 1 →