Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

被引:0
|
作者
Wang, Zhehui [1 ]
Luo, Tao [1 ]
Liu, Cheng [2 ]
Liu, Weichen [3 ]
Goh, Rick Siow Mong [1 ]
Wong, Weng-Fai [4 ]
机构
[1] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore
[2] Chinese Acad Sci, Beijing 100045, Peoples R China
[3] Nanyang Technol Univ, Beijing 639798, Singapore
[4] Natl Univ Singapore, Singapore 119077, Singapore
关键词
Memristors; Computer architecture; Random access memory; Nonvolatile memory; Computational modeling; Neural networks; Mathematical models; Energy efficiency; Energy consumption; Transistors; Large language model (LLM); memristor crossbar; model deployment; natural language processing; non-volatile memory; MEMORY; RERAM;
D O I
10.1109/TPAMI.2024.3483654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. First, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Second, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERTLarge Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39x in area overhead and 18x in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68x reduction in the area-delay product and a significant 69% energy consumption reduction.
引用
收藏
页码:916 / 933
页数:18
相关论文
共 50 条
  • [31] Question Generation Capabilities of "Small" Large Language Models
    Berger, Joshua
    Koss, Jonathan
    Stamatakis, Markos
    Hoppe, Anett
    Ewerth, Ralph
    Wartenal, Christian
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 183 - 194
  • [32] Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices
    Cantini, Riccardo
    Orsino, Alessio
    Talia, Domenico
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [33] Energy-efficient data gathering in large wireless sensor networks
    Lu, KZ
    Huang, LS
    Wan, YY
    Xu, HL
    ICESS 2005: SECOND INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2005, : 327 - 331
  • [34] Energy-Efficient Large-Scale Matrix Multiplication on FPGAs
    Matam, Kiran Kumar
    Prasanna, Viktor K.
    2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
  • [35] Reliable Memristor Crossbar Array Based on 2D Layered Nickel Phosphorus Trisulfide for Energy-Efficient Neuromorphic Hardware
    Weng, Zhengjin
    Zheng, Haofei
    Li, Lingqi
    Lei, Wei
    Jiang, Helong
    Ang, Kah-Wee
    Zhao, Zhiwei
    SMALL, 2024, 20 (05)
  • [36] AxLaM: energy-efficient accelerator design for language models for edge computing
    Glint, Tom
    Mittal, Bhumika
    Sharma, Santripta
    Ronak, Abdul Qadir
    Goud, Abhinav
    Kasture, Neerja
    Momin, Zaqi
    Krishna, Aravind
    Mekie, Joycee
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2025, 383 (2288):
  • [37] Enabling access to large-language models (LLMs) at scale for higher education
    Nadel, Peter
    Maloney, Delilah
    Monahan, Kyle M.
    PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2024, PEARC 2024, 2024,
  • [38] Optimal Base Station Deployment for Small Cell Networks with Energy-Efficient Power Control
    Peng, Ching-Ting
    Wang, Li-Chun
    Liu, Chun-Hung
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 1863 - 1868
  • [39] Leveraging Large Language Models for Efficient Alert Aggregation in AIOPs
    Zha, Junjie
    Shan, Xinwen
    Lu, Jiaxin
    Zhu, Jiajia
    Liu, Zihan
    ELECTRONICS, 2024, 13 (22)
  • [40] A Method for Efficient Structured Data Generation with Large Language Models
    Hou, Zongzhi
    Zhao, Ruohan
    Li, Zhongyang
    Wang, Zheng
    Wu, Yizhen
    Gou, Junwei
    Zhu, Zhifeng
    PROCEEDINGS OF THE 2ND WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM(CUBE)A 2024, 2024, : 36 - 44