Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

被引：0

作者：

Wang, Zhehui ^{[1
]}

Luo, Tao ^{[1
]}

Liu, Cheng ^{[2
]}

Liu, Weichen ^{[3
]}

Goh, Rick Siow Mong ^{[1
]}

Wong, Weng-Fai ^{[4
]}

机构：

[1] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore

[2] Chinese Acad Sci, Beijing 100045, Peoples R China

[3] Nanyang Technol Univ, Beijing 639798, Singapore

[4] Natl Univ Singapore, Singapore 119077, Singapore

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2025年 / 47卷 / 02期

关键词：

Memristors; Computer architecture; Random access memory; Nonvolatile memory; Computational modeling; Neural networks; Mathematical models; Energy efficiency; Energy consumption; Transistors; Large language model (LLM); memristor crossbar; model deployment; natural language processing; non-volatile memory; MEMORY; RERAM;

D O I：

10.1109/TPAMI.2024.3483654

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. First, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Second, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERTLarge Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39x in area overhead and 18x in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68x reduction in the area-delay product and a significant 69% energy consumption reduction.

引用

页码：916 / 933

页数：18

共 50 条

[31] Question Generation Capabilities of "Small" Large Language Models
Berger, Joshua
Koss, Jonathan
Stamatakis, Markos
Hoppe, Anett
Ewerth, Ralph
Wartenal, Christian
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
[32] Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices
Cantini, Riccardo
Orsino, Alessio
Talia, Domenico
JOURNAL OF BIG DATA, 2024, 11 (01)
[33] Energy-efficient data gathering in large wireless sensor networks
Lu, KZ
Huang, LS
Wan, YY
Xu, HL
ICESS 2005: SECOND INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2005, : 327 - 331
[34] Energy-Efficient Large-Scale Matrix Multiplication on FPGAs
Matam, Kiran Kumar
Prasanna, Viktor K.
2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
[35] Reliable Memristor Crossbar Array Based on 2D Layered Nickel Phosphorus Trisulfide for Energy-Efficient Neuromorphic Hardware
Weng, Zhengjin
Zheng, Haofei
Li, Lingqi
Lei, Wei
Jiang, Helong
Ang, Kah-Wee
Zhao, Zhiwei
SMALL, 2024, 20 (05)
[36] AxLaM: energy-efficient accelerator design for language models for edge computing
Glint, Tom
Mittal, Bhumika
Sharma, Santripta
Ronak, Abdul Qadir
Goud, Abhinav
Kasture, Neerja
Momin, Zaqi
Krishna, Aravind
Mekie, Joycee
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2025, 383 (2288):
[37] Enabling access to large-language models (LLMs) at scale for higher education
Nadel, Peter
Maloney, Delilah
Monahan, Kyle M.
PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2024, PEARC 2024, 2024,
[38] Optimal Base Station Deployment for Small Cell Networks with Energy-Efficient Power Control
Peng, Ching-Ting
Wang, Li-Chun
Liu, Chun-Hung
2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 1863 - 1868
[39] Leveraging Large Language Models for Efficient Alert Aggregation in AIOPs
Zha, Junjie
Shan, Xinwen
Lu, Jiaxin
Zhu, Jiajia
Liu, Zihan
ELECTRONICS, 2024, 13 (22)
[40] A Method for Efficient Structured Data Generation with Large Language Models
Hou, Zongzhi
Zhao, Ruohan
Li, Zhongyang
Wang, Zheng
Wu, Yizhen
Gou, Junwei
Zhu, Zhifeng
PROCEEDINGS OF THE 2ND WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM(CUBE)A 2024, 2024, : 36 - 44

← 1 2 3 4 5 →