NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

被引:11
|
作者
Yu, Joonsang [1 ]
Park, Junki [2 ]
Park, Seongmin [3 ]
Kim, Minsoo [3 ]
Lee, Sihwa [3 ]
Lee, Dong Hyun [2 ]
Choi, Jungwook [3 ]
机构
[1] NAVER Clova, Seongnam, South Korea
[2] Samsung Adv Inst Technol, Mountain View, CA USA
[3] Hanyang Univ, Seoul, South Korea
关键词
Neural network; Transformer; Non-linear function; Look-up table;
D O I
10.1145/3489517.3530505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a Look-up table(LUT). The proposed framework called Neural network generated LUT(NN-LUT) can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.
引用
收藏
页码:577 / 582
页数:6
相关论文
共 50 条
  • [1] An Efficient Piecewise Linear Approximation of Non-linear Operations for Transformer Inference
    Lu, Haodong
    Mei, Qichang
    Wang, Kun
    [J]. 2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 206 - 206
  • [2] Auto-LUT: Auto Approximation of Non-Linear Operations for Neural Networks on FPGA
    Lu, Haodong
    Mei, Qichang
    Wang, Kun
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [3] APPROXIMATION TECHNIQUE FOR NON-LINEAR INTEGRAL OPERATIONS
    HELTON, J
    STUCKWISCH, S
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1978, 65 (02) : 365 - 374
  • [4] Efficient approximation for linear and non-linear signal representation
    Bilgehan, Buelent
    [J]. IET SIGNAL PROCESSING, 2015, 9 (03) : 260 - 266
  • [5] Range-Invariant Approximation of Non-Linear Operations for Efficient BERT Fine-Tuning
    Kim, Janghyeon
    Lee, Janghwan
    Choi, Jungwook
    Han, JeongHo
    Lee, Sangheon
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [6] Linear Approximation of Deep Neural Networks for Efficient Inference on Video Data
    Rueckauer, Bodo
    Liu, Shih-Chii
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [7] DIF-LUT: A Simple Yet Scalable Approximation for Non-linear Activation Function on FPGA
    Liu, Yang
    He, Xiaoming
    Yu, Jun
    Wang, Kun
    [J]. 2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 322 - 326
  • [8] RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
    Saha, Oindrila
    Kusupati, Aditya
    Simhadri, Harsha Vardhan
    Varma, Manik
    Jain, Prateek
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] A proposal of neural network architecture for non-linear function approximation
    Mizukami, Y
    Wakasa, Y
    Tanaka, K
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 605 - 608
  • [10] A logarithmic neural network architecture for unbounded non-linear function approximation
    Hines, JW
    [J]. ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1245 - 1250