Quantization and Hardware Architecture Co-Design for Matrix-Vector Multiplications of Large Language Models

被引:1
|
作者
Li, Wenjie [1 ]
Hu, Aokun [1 ]
Xu, Ningyi [1 ]
He, Guanghui [2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Micro Nano Elect, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Large language models; quantization; hardware architecture; precision-scalable; outlier;
D O I
10.1109/TCSI.2024.3350661
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), and have garnered tremendous attention in both academic research and everyday life, thanks to their unprecedented performance in a wide range of applications. However, their deployment remains a significant challenge, primarily due to their intensive computational and memory requirements. Hardware acceleration and efficient quantization are promising solutions to address the two issues. In this paper, a quantization and hardware architecture co-design is presented for matrix-vector multiplications (MVMs) of LLMs. During quantization, we uniformly group weights and activations to ensure workload balance for hardware. To enhance the performance of quantization, we further propose two approaches called channel sorting and channel selection, which can be applied simultaneously. To support the proposed quantization scheme, we develop two precision-scalable MVM hardware architectures. They are specifically designed for high speed and high energy efficiency, respectively. Experimental results show that our proposed quantization scheme achieves state-of-the-art performance among all the reported post-training schemes that quantize both weights and activations into integers. Compared to MVM architecture of the state-of-the-art LLM accelerator OliVe, our design exhibits significant advantages in terms of area efficiency and energy efficiency.
引用
收藏
页码:2858 / 2871
页数:14
相关论文
共 50 条
  • [21] When Neural Architecture Search Meets Hardware Implementation: from Hardware Awareness to Co-Design
    Zhang, Xinyi
    Jiang, Weiwen
    Shi, Yiyu
    Hu, Jingtong
    [J]. 2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 25 - 30
  • [22] Hardware/software co-design of global cloud system resolving models
    Wehner, Michael F.
    Oliker, Leonid
    Shalf, John
    Donofrio, David
    Drummond, Leroy A.
    Heikes, Ross
    Kamil, Shoaib
    Kono, Celal
    Miller, Norman
    Miura, Hiroaki
    Mohiyuddin, Marghoob
    Randall, David
    Yang, Woo-Sun
    [J]. JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2011, 3
  • [23] Circuit Architecture Test Verification Based on Hardware Software Co-design with ModelSim
    Das, Sunil R.
    Li, Jun-Feng
    Nayak, Amiya R.
    Assaf, Mansour H.
    Petriu, Emil M.
    Biswas, Satyendra N.
    [J]. IETE JOURNAL OF RESEARCH, 2013, 59 (02) : 132 - 140
  • [24] An Efficient Architecture for a TCP Offload Engine Based on Hardware/Software Co-design
    Jang, Hankook
    Chung, Sang-Hwa
    Kim, Dung Kyue
    Lee, Yun-Sung
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (02) : 493 - 509
  • [25] SCORCH: Neural Architecture Search and Hardware Accelerator Co-design with Reinforcement Learning
    Liu, Siqin
    Karanth, Avinash
    [J]. 2024 25TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED 2024, 2024,
  • [26] Efficient architecture and implementation of vector median filter in co-design context
    Boudabous, Anis
    Khriji, Lazhar
    Ben Atitallah, A.
    Kadionik, P.
    Masmoudi, Nouri
    [J]. RADIOENGINEERING, 2007, 16 (03) : 113 - 119
  • [27] The High Level Architecture (HLA) on Photonic Torus: Hardware and Software Co-design
    Imre, Kayhan
    Sevim, Nevzat
    [J]. 2013 8TH EUROSIM CONGRESS ON MODELLING AND SIMULATION (EUROSIM), 2013, : 550 - 554
  • [28] Long overdue unified hardware/software co-design language comes to light
    Wong, S
    [J]. ELECTRONIC DESIGN, 1998, 46 (11) : 60 - +
  • [29] Long overdue unified hardware/software co-design language comes to light
    [J]. Electronic Design, 2009, 46 (11): : 60 - 62
  • [30] Exploring Large Language Models for Verilog hardware design generation
    D'Hollander, Erik H.
    Danneels, Ewout
    Decorte, Karel-Brecht
    Loobuyck, Senne
    Vanheule, Ame
    Van Kets, Ian
    Stroobandt, Dirk
    [J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 111 - 115