An Efficient Piecewise Linear Approximation of Non-linear Operations for Transformer Inference

被引:1
|
作者
Lu, Haodong [1 ]
Mei, Qichang [1 ]
Wang, Kun [1 ]
机构
[1] Fudan Univ, State Key Lab ASIC & Syst, Shanghai, Peoples R China
关键词
D O I
10.1109/FCCM57271.2023.00034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based models have achieved remarkable performance across various tasks, while the computational complexity presents an obstacle for deploying on resource-constrained devices. To this end, this paper proposes an efficient approximation framework termed NPLA for approximating non-linear operations during Transformer inference on hardware accelerators. Specifically, NPLA enables the approximation of non-linear operations using non-uniform piecewise linear functions and directly converts coefficients into LUTs for hardware implementation. Experimental results demonstrate that NPLA can reduce the hardware cost by 13.43x in LUTs and 1.98x in DSP compared to the state-of-the-art method.
引用
收藏
页码:206 / 206
页数:1
相关论文
共 50 条