Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

被引:0
|
作者
Wang, Zhehui [1 ]
Luo, Tao [1 ]
Liu, Cheng [2 ]
Liu, Weichen [3 ]
Goh, Rick Siow Mong [1 ]
Wong, Weng-Fai [4 ]
机构
[1] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore
[2] Chinese Acad Sci, Beijing 100045, Peoples R China
[3] Nanyang Technol Univ, Beijing 639798, Singapore
[4] Natl Univ Singapore, Singapore 119077, Singapore
关键词
Memristors; Computer architecture; Random access memory; Nonvolatile memory; Computational modeling; Neural networks; Mathematical models; Energy efficiency; Energy consumption; Transistors; Large language model (LLM); memristor crossbar; model deployment; natural language processing; non-volatile memory; MEMORY; RERAM;
D O I
10.1109/TPAMI.2024.3483654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. First, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Second, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERTLarge Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39x in area overhead and 18x in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68x reduction in the area-delay product and a significant 69% energy consumption reduction.
引用
收藏
页码:916 / 933
页数:18
相关论文
共 50 条
  • [41] Efficient Tuning and Inference for Large Language Models on Textual Graphs
    Zhu, Yun
    Wang, Yaoke
    Shi, Haizhou
    Tang, Siliang
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5734 - 5742
  • [42] DiJiang: Efficient Large Language Models through Compact Kernelization
    Chen, Hanting
    Liu, Zhicheng
    Wang, Xutao
    Tian, Yuchuan
    Wang, Yunhe
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235
  • [43] Towards efficient and effective unlearning of large language models for recommendation
    Wang, Hangyu
    Lin, Jianghao
    Chen, Bo
    Yang, Yang
    Tang, Ruiming
    Zhang, Weinan
    Yu, Yong
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (03)
  • [44] Asymmetrical Training Scheme of Binary-Memristor-Crossbar-Based Neural Networks for Energy-Efficient Edge-Computing Nanoscale Systems
    Khoa Van Pham
    Son Bao Tran
    Tien Van Nguyen
    Min, Kyeong-Sik
    MICROMACHINES, 2019, 10 (02)
  • [45] Wafer-Scale 2D Hafnium Diselenide Based Memristor Crossbar Array for Energy-Efficient Neural Network Hardware
    Li, Sifan
    Pam, Mei-Er
    Li, Yesheng
    Chen, Li
    Chien, Yu-Chieh
    Fong, Xuanyao
    Chi, Dongzhi
    Ang, Kah-Wee
    ADVANCED MATERIALS, 2022, 34 (25)
  • [46] Small, Medium, and Large Language Models for Text-to-SQL
    Oliveira, Aiko
    Nascimento, Eduardo
    Pinheiro, Joao
    Avila, Caio Viktor S.
    Coelho, Gustavo
    Feijo, Lucas
    Izquierdo, Yenier
    Garcia, Grettel
    Paes Leme, Luiz Andre P.
    Lemos, Melissa
    Casanova, Marco A.
    CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 276 - 294
  • [47] TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System
    Rasmussen, Alexander
    Porter, George
    Conley, Michael
    Madhyastha, Harsha V.
    Mysore, Radhika Niranjan
    Pucher, Alexander
    Vahdat, Amin
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2013, 31 (01):
  • [48] Iterative Clustering for Energy-Efficient Large-Scale Tracking Systems
    Alfares, Hesham K.
    Abu Elkhail, Abdulrahman
    Baroudi, Uthman
    WIRELESS PERSONAL COMMUNICATIONS, 2020, 110 (02) : 713 - 733
  • [49] An energy-efficient management mechanism for large-scale server clusters
    Xue, Zhenghua
    Dong, Xiaoshe
    Ma, Siyuan
    Fan, Shengqun
    Mei, Yiduo
    2ND IEEE ASIA-PACIFIC SERVICES COMPUTING CONFERENCE, PROCEEDINGS, 2007, : 509 - 516
  • [50] Towards energy-efficient storage placement in large scale sensor networks
    Xie, Lei
    Lu, Sanglu
    Cao, Yingchun
    Chen, Daoxu
    FRONTIERS OF COMPUTER SCIENCE, 2014, 8 (03) : 409 - 425