Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models

被引：11

作者：

Zadeh, Ali Hadi ^{[1
]}

Mahmoud, Mostafa ^{[2
]}

Abdelhadi, Ameer ^{[2
]}

Moshovos, Andreas ^{[1
]}

机构：

[1] Univ Toronto, Vector Inst, Toronto, ON, Canada

[2] Univ Toronto, Toronto, ON, Canada

来源：

PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22) | 2022年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Quantization; Natural Language Processing; Transformer Models;

D O I：

10.1145/3470496.3527438

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy and capability for Natural Language Processing applications. These models demand more computational power, storage, and energy. Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids. Mokey does not need fine-tuning, an essential feature as often the training resources or datasets are not available to many. Exploiting the range of values that naturally occur in transformer models, Mokey selects centroid values to also fit an exponential curve. This unique feature enables Mokey to replace the bulk of the original multiply-accumulate operations with narrow 3b fixed-point additions resulting in an area- and energy-efficient hardware accelerator design. Over a set of state-of-the-art transformer models, the Mokey accelerator delivers an order of magnitude improvements in energy efficiency over a Tensor Cores-based accelerator while improving performance by at least 4x and as much as 15x depending on the model and on-chip buffering capacity. Optionally, Mokey can be used as memory compression assist for any other accelerator transparently stashing wide floating-point or fixed-point activations or weights into narrow 4-bit indexes. Mokey proves superior to prior state-of-the-art quantization methods for Transformers.

引用

页码：888 / 901

页数：14

共 49 条

[31] Selection of floating-point or fixed-point for adaptive noise canceller in somatosensory evoked potential measurement
Shen, Chongfei
Liu, Hongtao
Xie, X. B.
Luk, Keith D. K.
Hu, Yong
2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 3274 - +
[32] A FLOATING-POINT TO FIXED-POINT ASSEMBLY PROGRAM TRANSLATOR FOR THE TMS-320C25
KIM, SY
SUNG, WY
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 1994, 41 (11) : 730 - 739
[33] An efficient floating-point to fixed-point conversion process for biometric algorithm on DaVinci DSP architecture
Konvalinka, Ira
Quddus, Azhar
Asraf, Daniel
OPTICS AND PHOTONICS IN GLOBAL HOMELAND SECURITY V AND BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VI, 2009, 7306
[34] Cause and origin of moire interferences in recursive processes and with fixed-point and floating-point data types
Alcover Garau, Pedro Maria
COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2020, 80
[35] Energy-efficiency of floating-point and fixed-point SIMD cores for MIMO processing systems
Guenther, D.
Bytyn, A.
Leupers, R.
Ascheid, G.
2014 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP (SOC), 2014,
[36] A floating-point to integer C converter with shift reduction for fixed-point digital signal processors
Kum, KI
Kang, JY
Sung, WY
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 2163 - 2166
[37] A fixed-point implementation of tone mapping operation for HDR images expressed in floating-point format
Dobashi, Toshiyuki
Tashiro, Atsushi
Iwahashi, Masahiro
Kiya, Hitoshi
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2014, 3
[38] Fixed-point vs Floating-point arithmetic comparison for adaptive optics real time control computation
Martin-Hernando, Yolanda
Fernando Rodriguez-Ramos, Luis
Reyes Garcia-Talavera, Marcos
ADAPTIVE OPTICS SYSTEMS, PTS 1-3, 2008, 7015
[39] FPGA Implementation of a Decimal Floating-Point Accurate Scalar Product Unit with a Parallel Fixed-Point Multiplier
Baesler, Malte
Teufel, Thomas
2009 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS, 2009, : 6 - 11
[40] A floating-point coprocessor configured by a FPGA in a digital platform based on fixed-point DSP for power electronics
Hu, Haibing
Jin, Tianjun
Zhang, Xianmiao
Lu, Zhengyu
Qian, Zhaoming
IPEMC 2006: CES/IEEE 5TH INTERNATIONAL POWER ELECTRONICS AND MOTION CONTROL CONFERENCE, VOLS 1-3, CONFERENCE PROCEEDINGS, 2006, : 1183 - +

← 1 2 3 4 5 →