Hardware-oriented algorithms for softmax and layer normalization of large language models

被引:0
|
作者
Li, Wenjie [1 ]
Lyu, Dongxu [1 ]
Wang, Gang [1 ]
Hu, Aokun [1 ]
Xu, Ningyi [1 ]
He, Guanghui [1 ,2 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200241, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Micro Nano Elect, Shanghai 200241, Peoples R China
[3] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Shanghai 200241, Peoples R China
基金
中国国家自然科学基金;
关键词
large language model; softmax; layer normalization; hardware architecture; Transformer;
D O I
10.1007/s11432-024-4137-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
While large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked. This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore, hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] WiP: Towards Light Adaptation of Large Language Models For Personal Hardware
    Wang, Liangyu
    Wang, Junxiao
    Wang, Di
    PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 30 - 32
  • [22] On Hardware Security Bug Code Fixes by Prompting Large Language Models
    Ahmad, Baleegh
    Thakur, Shailja
    Tan, Benjamin
    Karri, Ramesh
    Pearce, Hammond
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4043 - 4057
  • [23] Large Language Models for Clinical Text Cleansing Enhance Medical Concept Normalization
    Abdulnazar, Akhila
    Roller, Roland
    Schulz, Stefan
    Kreuzthaler, Markus
    IEEE ACCESS, 2024, 12 : 147981 - 147990
  • [24] Fine-tuning large language models for rare disease concept normalization
    Wang, Andy
    Liu, Cong
    Yang, Jingye
    Weng, Chunhua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
  • [25] Hardware-Oriented Early Detection Algorithms for 4 x 4 and 8 x 8 All-Zero Blocks in H.264
    Liu, Qin
    Huang, Yiqing
    Goto, Satoshi
    Ikenaga, Takeshi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2009, E92A (04): : 1063 - 1071
  • [26] Studying large language models as compression algorithms for human culture
    Buttrick, Nicholas
    TRENDS IN COGNITIVE SCIENCES, 2024, 28 (03) : 187 - 189
  • [27] Refactoring goal-oriented models: a linguistic improvement using large language models
    Alturayeif, Nouf
    Hassine, Jameleddine
    SOFTWARE AND SYSTEMS MODELING, 2025,
  • [28] Comparison of Machine Learning Algorithms and Large Language Models for Product Categorization
    Ihsanoglu, Abdullah
    Zaval, Mounes
    Yildiz, Olcay Taner
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [29] Leveraging Large Language Models for the Generation of Novel Metaheuristic Optimization Algorithms
    Pluhacek, Michal
    Kazikova, Anezka
    Kadavy, Tomas
    Viktorin, Adam
    Senkerik, Roman
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1812 - 1820
  • [30] A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets
    Fu, Weimin
    Li, Shijie
    Zhao, Yifang
    Yang, Kaichen
    Zhang, Xuan
    Jin, Yier
    Guo, Xiaolong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2025, 72 (02) : 623 - 636