Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models

被引:11
|
作者
Zadeh, Ali Hadi [1 ]
Mahmoud, Mostafa [2 ]
Abdelhadi, Ameer [2 ]
Moshovos, Andreas [1 ]
机构
[1] Univ Toronto, Vector Inst, Toronto, ON, Canada
[2] Univ Toronto, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Quantization; Natural Language Processing; Transformer Models;
D O I
10.1145/3470496.3527438
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy and capability for Natural Language Processing applications. These models demand more computational power, storage, and energy. Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids. Mokey does not need fine-tuning, an essential feature as often the training resources or datasets are not available to many. Exploiting the range of values that naturally occur in transformer models, Mokey selects centroid values to also fit an exponential curve. This unique feature enables Mokey to replace the bulk of the original multiply-accumulate operations with narrow 3b fixed-point additions resulting in an area- and energy-efficient hardware accelerator design. Over a set of state-of-the-art transformer models, the Mokey accelerator delivers an order of magnitude improvements in energy efficiency over a Tensor Cores-based accelerator while improving performance by at least 4x and as much as 15x depending on the model and on-chip buffering capacity. Optionally, Mokey can be used as memory compression assist for any other accelerator transparently stashing wide floating-point or fixed-point activations or weights into narrow 4-bit indexes. Mokey proves superior to prior state-of-the-art quantization methods for Transformers.
引用
收藏
页码:888 / 901
页数:14
相关论文
共 49 条
  • [31] Selection of floating-point or fixed-point for adaptive noise canceller in somatosensory evoked potential measurement
    Shen, Chongfei
    Liu, Hongtao
    Xie, X. B.
    Luk, Keith D. K.
    Hu, Yong
    2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 3274 - +
  • [32] A FLOATING-POINT TO FIXED-POINT ASSEMBLY PROGRAM TRANSLATOR FOR THE TMS-320C25
    KIM, SY
    SUNG, WY
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 1994, 41 (11) : 730 - 739
  • [33] An efficient floating-point to fixed-point conversion process for biometric algorithm on DaVinci DSP architecture
    Konvalinka, Ira
    Quddus, Azhar
    Asraf, Daniel
    OPTICS AND PHOTONICS IN GLOBAL HOMELAND SECURITY V AND BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VI, 2009, 7306
  • [34] Cause and origin of moire interferences in recursive processes and with fixed-point and floating-point data types
    Alcover Garau, Pedro Maria
    COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2020, 80
  • [35] Energy-efficiency of floating-point and fixed-point SIMD cores for MIMO processing systems
    Guenther, D.
    Bytyn, A.
    Leupers, R.
    Ascheid, G.
    2014 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP (SOC), 2014,
  • [36] A floating-point to integer C converter with shift reduction for fixed-point digital signal processors
    Kum, KI
    Kang, JY
    Sung, WY
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 2163 - 2166
  • [37] A fixed-point implementation of tone mapping operation for HDR images expressed in floating-point format
    Dobashi, Toshiyuki
    Tashiro, Atsushi
    Iwahashi, Masahiro
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2014, 3
  • [38] Fixed-point vs Floating-point arithmetic comparison for adaptive optics real time control computation
    Martin-Hernando, Yolanda
    Fernando Rodriguez-Ramos, Luis
    Reyes Garcia-Talavera, Marcos
    ADAPTIVE OPTICS SYSTEMS, PTS 1-3, 2008, 7015
  • [39] FPGA Implementation of a Decimal Floating-Point Accurate Scalar Product Unit with a Parallel Fixed-Point Multiplier
    Baesler, Malte
    Teufel, Thomas
    2009 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS, 2009, : 6 - 11
  • [40] A floating-point coprocessor configured by a FPGA in a digital platform based on fixed-point DSP for power electronics
    Hu, Haibing
    Jin, Tianjun
    Zhang, Xianmiao
    Lu, Zhengyu
    Qian, Zhaoming
    IPEMC 2006: CES/IEEE 5TH INTERNATIONAL POWER ELECTRONICS AND MOTION CONTROL CONFERENCE, VOLS 1-3, CONFERENCE PROCEEDINGS, 2006, : 1183 - +