Hardware-oriented algorithms for softmax and layer normalization of large language models

被引:0
|
作者
Li, Wenjie [1 ]
Lyu, Dongxu [1 ]
Wang, Gang [1 ]
Hu, Aokun [1 ]
Xu, Ningyi [1 ]
He, Guanghui [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200241, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Micro Nano Elect, Shanghai 200241, Peoples R China
[3] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Shanghai 200241, Peoples R China
基金
中国国家自然科学基金;
关键词
large language model; softmax; layer normalization; hardware architecture; Transformer;
D O I
10.1007/s11432-024-4137-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked. This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore, hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open Issues
    Abdollahi, Meisam
    Yeganli, Seyedeh Faegheh
    Baharloo, Mohammad
    Baniasadi, Amirali
    ELECTRONICS, 2025, 14 (01):
  • [32] Are Large Language Models All You Need for Task-Oriented Dialogue?
    Hudecek, Vojtech
    Dusek, Ondrej
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 216 - 228
  • [33] Enhancing Troubleshooting Task-Oriented Dialog Systems with Large Language Models
    Zhou, Jiahao
    Zhang, Qiang
    Zhang, Fengda
    Yuan, Caixia
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT VI, 2025, 15206 : 328 - 338
  • [34] Towards the Integration of Large Language Models in an Object-Oriented Programming Course
    Cipriano, Bruno Pereira
    PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 2, ITICSE 2024, 2024, : 832 - 833
  • [35] THINKING ENOUGH? EVALUATING ADVANCED LARGE LANGUAGE MODELS' REASONING ALGORITHMS IN HEOR
    Swami, S.
    Srivastava, T.
    VALUE IN HEALTH, 2024, 27 (12)
  • [36] Large language models facilitate the generation of electronic health record phenotyping algorithms
    Yan, Chao
    Ong, Henry H.
    Grabowska, Monika E.
    Krantz, Matthew S.
    Su, Wu-Chen
    Dickson, Alyson L.
    Peterson, Josh F.
    Feng, QiPing
    Roden, Dan M.
    Stein, C. Michael
    Kerchberger, V. Eric
    Malin, Bradley A.
    Wei, Wei-Qi
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1994 - 2001
  • [37] When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges
    Wang, Chao
    Zhao, Jiaxuan
    Jiao, Licheng
    Li, Lingling
    Liu, Fang
    Yang, Shuyuan
    RESEARCH, 2025, 8
  • [38] Jailbreaking Pre-trained Large Language Models Towards Hardware Vulnerability Insertion Ability
    Wan, Gwok-Waa
    Wong, Sam-Zaak
    Wang, Xi
    PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 579 - 582
  • [39] Integrating Large Language Models and Metaverse in Autonomous Racing: An Education-Oriented Perspective
    Li, Bai
    Xu, Tian'ao
    Li, Xinyuan
    Cui, Yaodong
    Bian, Xuepeng
    Teng, Siyu
    Ma, Siji
    Fan, Lili
    Tian, Yonglin
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 59 - 64
  • [40] A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models
    Wu, Yixi
    He, Pengfei
    Wang, Zehao
    Wang, Shaowei
    Tian, Yuan
    Chen, Tse-Hsun
    arXiv,