Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

被引:0
|
作者
Wang, Zhehui [1 ]
Luo, Tao [1 ]
Liu, Cheng [2 ]
Liu, Weichen [3 ]
Goh, Rick Siow Mong [1 ]
Wong, Weng-Fai [4 ]
机构
[1] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore
[2] Chinese Acad Sci, Beijing 100045, Peoples R China
[3] Nanyang Technol Univ, Beijing 639798, Singapore
[4] Natl Univ Singapore, Singapore 119077, Singapore
关键词
Memristors; Computer architecture; Random access memory; Nonvolatile memory; Computational modeling; Neural networks; Mathematical models; Energy efficiency; Energy consumption; Transistors; Large language model (LLM); memristor crossbar; model deployment; natural language processing; non-volatile memory; MEMORY; RERAM;
D O I
10.1109/TPAMI.2024.3483654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. First, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Second, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERTLarge Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39x in area overhead and 18x in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68x reduction in the area-delay product and a significant 69% energy consumption reduction.
引用
收藏
页码:916 / 933
页数:18
相关论文
共 50 条
  • [1] Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers
    Scherer, Moritz
    Macan, Luka
    Jung, Victor J. B.
    Wiese, Philip
    Bompani, Luca
    Burrello, Alessio
    Conti, Francesco
    Benini, Luca
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (11) : 4009 - 4020
  • [2] Evaluation of Memristor Models for Large Crossbar Structures
    Kolka, Zdenek
    Biolek, Dalibor
    Biolkova, Viera
    Biolek, Zdenek
    PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA 2016), 2016, : 91 - 94
  • [3] Designing Energy-Efficient PATH-based Decision Tree Memristor Crossbar Circuits
    Sinha, Pranav
    Chavan, Akash
    Raj, Sunny
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY, NANO 2024, 2024, : 209 - 213
  • [4] Designing Energy-Efficient Decision Tree Memristor Crossbar Circuits using Binary Classification Graphs
    Sinha, Pranav
    Raj, Sunny
    2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [5] Analyzing Declarative Deployment Code with Large Language Models
    Lanciano, Giacomo
    Stein, Manuel
    Hilt, Volker
    Cucinotta, Tommaso
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, CLOSER 2023, 2023, : 289 - 296
  • [6] Enabling Large Language Models to Generate Text with Citations
    Gao, Tianyu
    Yen, Howard
    Yu, Jiatong
    Chen, Danqi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6465 - 6488
  • [7] Energy-efficient design of large office buildings
    Radford, J
    Addison, MS
    Smith, T
    ENERGY ENGINEERING, 2001, 98 (01) : 61 - +
  • [8] Navigating Challenges and Technical Debt in Large Language Models Deployment
    Menshawy, Ahmed
    Nawaz, Zeeshan
    Fahmy, Mahmoud
    PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024, 2024, : 192 - 199
  • [9] Deployment and Comparison of Large Language Models Based on Virtual Cluster
    Li, Kai
    Cao, Rongqiang
    Wan, Meng
    Wang, Xiaoguang
    Wang, Zongguo
    Wang, Jue
    Wang, Yangang
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 359 - 365
  • [10] Online energy-efficient deployment based on equivalent continuous DFS for large-scale web cluster
    Zhi Xiong
    Ting Guo
    Zhongliang Xue
    Weihong Cai
    Lingru Cai
    Nanfu Luo
    Cluster Computing, 2019, 22 : 583 - 596