Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models

被引:9
|
作者
Zadeh, Ali Hadi [1 ]
Mahmoud, Mostafa [2 ]
Abdelhadi, Ameer [2 ]
Moshovos, Andreas [1 ]
机构
[1] Univ Toronto, Vector Inst, Toronto, ON, Canada
[2] Univ Toronto, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Quantization; Natural Language Processing; Transformer Models;
D O I
10.1145/3470496.3527438
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy and capability for Natural Language Processing applications. These models demand more computational power, storage, and energy. Mokey reduces the footprint of state-of-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids. Mokey does not need fine-tuning, an essential feature as often the training resources or datasets are not available to many. Exploiting the range of values that naturally occur in transformer models, Mokey selects centroid values to also fit an exponential curve. This unique feature enables Mokey to replace the bulk of the original multiply-accumulate operations with narrow 3b fixed-point additions resulting in an area- and energy-efficient hardware accelerator design. Over a set of state-of-the-art transformer models, the Mokey accelerator delivers an order of magnitude improvements in energy efficiency over a Tensor Cores-based accelerator while improving performance by at least 4x and as much as 15x depending on the model and on-chip buffering capacity. Optionally, Mokey can be used as memory compression assist for any other accelerator transparently stashing wide floating-point or fixed-point activations or weights into narrow 4-bit indexes. Mokey proves superior to prior state-of-the-art quantization methods for Transformers.
引用
收藏
页码:888 / 901
页数:14
相关论文
共 49 条
  • [21] Unifying bit-width optimisation for fixed-point and floating-point designs
    Gaffar, AA
    Mencer, O
    Luk, W
    Cheung, PYK
    12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 79 - 88
  • [22] Floating-point to fixed-point conversion of tropical wood recognition system classifier
    Kusuma, Enas Dhuhri
    Yusof, Rubiyah
    Othman, Mohammad Fauzi
    International Journal of Circuits, Systems and Signal Processing, 2014, 8 : 376 - 387
  • [23] Design and Implementation of Adaptive Binary Divider for Fixed-Point and Floating-Point Numbers
    Bora, Satyajit
    Paily, Roy
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (02) : 1131 - 1145
  • [24] Design and Implementation of Adaptive Binary Divider for Fixed-Point and Floating-Point Numbers
    Satyajit Bora
    Roy Paily
    Circuits, Systems, and Signal Processing, 2022, 41 : 1131 - 1145
  • [25] Accelerating floating-point to fixed-point data type conversion with evolutionary algorithms
    Rosa, L. S.
    Toledo, C. F. M.
    Bonato, V.
    ELECTRONICS LETTERS, 2015, 51 (03) : 244 - 246
  • [26] An algorithm for converting floating-point computations to fixed-point in MATLAB based FPGA design
    Roy, S
    Banerjee, P
    41ST DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2004, 2004, : 484 - 487
  • [27] Optimal fixed-point VLSI structure of a floating-point based digital filter design
    Wu, AY
    Hwang, KF
    ISCAS '98 - PROCEEDINGS OF THE 1998 INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-6, 1998, : D375 - D378
  • [28] Toward Scalable Source Level Accuracy Analysis for Floating-point to Fixed-point Conversion
    Deest, Gael
    Yuki, Tomofumi
    Sentieys, Olivier
    Derrien, Steven
    2014 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2014, : 726 - 733
  • [29] Customizing Fixed-Point and Floating-Point Arithmetic - A Case Study in K-Means Clustering
    Barrois, Benjamin
    Sentieys, Olivier
    2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2017,
  • [30] Decimal Floating-Point Multiplier With Binary-Decimal Compression Based Fixed-Point Multiplier
    Gao, Shuli
    Al-Khalili, Dhamin
    Langlois, J. M. Pierre
    Chabini, Noureddine
    2017 IEEE 30TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2017,