Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models

被引:9
|
作者
Zadeh, Ali Hadi [1 ]
Mahmoud, Mostafa [2 ]
Abdelhadi, Ameer [2 ]
Moshovos, Andreas [1 ]
机构
[1] Univ Toronto, Vector Inst, Toronto, ON, Canada
[2] Univ Toronto, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Quantization; Natural Language Processing; Transformer Models;
D O I
10.1145/3470496.3527438
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy and capability for Natural Language Processing applications. These models demand more computational power, storage, and energy. Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids. Mokey does not need fine-tuning, an essential feature as often the training resources or datasets are not available to many. Exploiting the range of values that naturally occur in transformer models, Mokey selects centroid values to also fit an exponential curve. This unique feature enables Mokey to replace the bulk of the original multiply-accumulate operations with narrow 3b fixed-point additions resulting in an area- and energy-efficient hardware accelerator design. Over a set of state-of-the-art transformer models, the Mokey accelerator delivers an order of magnitude improvements in energy efficiency over a Tensor Cores-based accelerator while improving performance by at least 4x and as much as 15x depending on the model and on-chip buffering capacity. Optionally, Mokey can be used as memory compression assist for any other accelerator transparently stashing wide floating-point or fixed-point activations or weights into narrow 4-bit indexes. Mokey proves superior to prior state-of-the-art quantization methods for Transformers.
引用
收藏
页码:888 / 901
页数:14
相关论文
共 49 条
  • [1] Automatic floating-point to fixed-point transformations
    Han, Kyungtae
    Olson, Alex G.
    Evans, Brian L.
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 79 - +
  • [2] CELLULAR FIXED-POINT/FLOATING-POINT CONVERTOR
    FRECON, L
    ELECTRONICS LETTERS, 1970, 6 (05) : 132 - &
  • [3] Converting Executable Floating-Point Models to Executable and Synthesizable Fixed-Point Models
    Riche, Taylor L.
    Nagle, Jim
    Xu, Joyce
    Hubbard, Don
    2019 ACM/IEEE 22ND INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION (MODELS-C 2019), 2019, : 354 - 361
  • [4] Floating-point DSP extends fixed-point architecture
    Myrvaagnes, R
    ELECTRONIC PRODUCTS MAGAZINE, 1998, 41 (04): : 26 - 26
  • [5] STOCHASTIC MODELING FOR FLOATING-POINT TO FIXED-POINT CONVERSION
    Banciu, Andrei
    Casseau, Emmanuel
    Menard, Daniel
    Michel, Thierry
    2011 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2011, : 180 - 185
  • [6] An automated floating-point to fixed-point conversion methodology
    Shi, CC
    Brodersen, RW
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 529 - 532
  • [7] Computing floating-point logarithms with fixed-point operations
    Le Maire, Julien
    Brunie, Nicolas
    de Dinechin, Florent
    Muller, Jean-Michel
    2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), 2016, : 156 - 163
  • [8] Automated floating-point to fixed-point conversion with the fixify environment
    Belanovic, P
    Rupp, M
    16TH INTERNATIONAL WORKSHOP ON RAPID SYSTEM PROTOTYPING, PROCEEDINGS: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE, 2005, : 172 - 178
  • [9] Dual fixed-point: An efficient alternative to floating-point computation
    Ewe, CT
    Cheung, PYK
    Constantinides, GA
    FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2004, 3203 : 200 - 208
  • [10] $10 floating-point DSP approaches fixed-point price
    Levy, M
    EDN, 1998, 43 (08) : 11 - 11