Hardware-oriented algorithms for softmax and layer normalization of large language models

被引：0

作者：

Li, Wenjie ^{[1
]}

Lyu, Dongxu ^{[1
]}

Wang, Gang ^{[1
]}

Hu, Aokun ^{[1
]}

Xu, Ningyi ^{[1
]}

He, Guanghui ^{[1
,2
,3
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200241, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Micro Nano Elect, Shanghai 200241, Peoples R China

[3] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Shanghai 200241, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2024年 / 67卷 / 10期

基金：

中国国家自然科学基金;

关键词：

large language model; softmax; layer normalization; hardware architecture; Transformer;

D O I：

10.1007/s11432-024-4137-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

While large language models (LLMs) have sparked a new revolution in the field of natural language processing (NLP), their hardware accelerators have garnered tremendous attention. However, softmax and layer normalization which are the most common non-linear operations in LLMs are frequently overlooked. This paper presents hardware-oriented algorithms for both softmax and layer normalization of LLMs. We propose an approximate approach to implementing division in softmax and extend it for simultaneously computing square root and performing division in layer normalization. It replaces the original computation by multiplication and shifting. For softmax, we further approximate the exponential function by truncating its exponent and then reuse the involved subtraction. For layer normalization, we additionally simplify the computation of denominator by directly removing the term regarding the square of the mean. Furthermore, hardware architectures are developed for the proposed algorithms of softmax and layer normalization. They can work as plug-and-play units for LLM accelerators, requiring no fine-tuning and introducing negligible performance loss. Compared with the state-of-the-art designs, the proposed softmax architecture can save up to 23.45% area cost and 17.39% power consumption, while the proposed layer normalization architecture can save up to 32.70% area cost and 14.29% power consumption.

引用

页数：15

共 50 条

[21] WiP: Towards Light Adaptation of Large Language Models For Personal Hardware
Wang, Liangyu
Wang, Junxiao
Wang, Di
PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 30 - 32
[22] On Hardware Security Bug Code Fixes by Prompting Large Language Models
Ahmad, Baleegh
Thakur, Shailja
Tan, Benjamin
Karri, Ramesh
Pearce, Hammond
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4043 - 4057
[23] Large Language Models for Clinical Text Cleansing Enhance Medical Concept Normalization
Abdulnazar, Akhila
Roller, Roland
Schulz, Stefan
Kreuzthaler, Markus
IEEE ACCESS, 2024, 12 : 147981 - 147990
[24] Fine-tuning large language models for rare disease concept normalization
Wang, Andy
Liu, Cong
Yang, Jingye
Weng, Chunhua
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
[25] Hardware-Oriented Early Detection Algorithms for 4 x 4 and 8 x 8 All-Zero Blocks in H.264
Liu, Qin
Huang, Yiqing
Goto, Satoshi
Ikenaga, Takeshi
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2009, E92A (04): : 1063 - 1071
[26] Studying large language models as compression algorithms for human culture
Buttrick, Nicholas
TRENDS IN COGNITIVE SCIENCES, 2024, 28 (03) : 187 - 189
[27] Refactoring goal-oriented models: a linguistic improvement using large language models
Alturayeif, Nouf
Hassine, Jameleddine
SOFTWARE AND SYSTEMS MODELING, 2025,
[28] Comparison of Machine Learning Algorithms and Large Language Models for Product Categorization
Ihsanoglu, Abdullah
Zaval, Mounes
Yildiz, Olcay Taner
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[29] Leveraging Large Language Models for the Generation of Novel Metaheuristic Optimization Algorithms
Pluhacek, Michal
Kazikova, Anezka
Kadavy, Tomas
Viktorin, Adam
Senkerik, Roman
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1812 - 1820
[30] A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets
Fu, Weimin
Li, Shijie
Zhao, Yifang
Yang, Kaichen
Zhang, Xuan
Jin, Yier
Guo, Xiaolong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2025, 72 (02) : 623 - 636

← 1 2 3 4 5 →