Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small

被引:0
|
作者
Wang, Zhehui [1 ]
Luo, Tao [1 ]
Liu, Cheng [2 ]
Liu, Weichen [3 ]
Goh, Rick Siow Mong [1 ]
Wong, Weng-Fai [4 ]
机构
[1] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore
[2] Chinese Acad Sci, Beijing 100045, Peoples R China
[3] Nanyang Technol Univ, Beijing 639798, Singapore
[4] Natl Univ Singapore, Singapore 119077, Singapore
关键词
Memristors; Computer architecture; Random access memory; Nonvolatile memory; Computational modeling; Neural networks; Mathematical models; Energy efficiency; Energy consumption; Transistors; Large language model (LLM); memristor crossbar; model deployment; natural language processing; non-volatile memory; MEMORY; RERAM;
D O I
10.1109/TPAMI.2024.3483654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. First, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Second, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERTLarge Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39x in area overhead and 18x in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68x reduction in the area-delay product and a significant 69% energy consumption reduction.
引用
收藏
页码:916 / 933
页数:18
相关论文
共 50 条
  • [21] Energy-efficient three-terminal SiOx memristor crossbar array enabled by vertical Si/graphene heterojunction barristor
    Choi, Sanghyeon
    Choi, Jae-Wan
    Kim, Jong Chan
    Jeong, Hu Young
    Shin, Jaeho
    Jang, Seonghoon
    Ham, Seonggil
    Kim, Nam Dong
    Wang, Gunuk
    NANO ENERGY, 2021, 84
  • [22] LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference
    Zhang, Hengrui
    Ning, August
    Prabhakar, Rohan Baskar
    Wentzlaff, David
    2024 ACM/IEEE 51ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2024, 2024, : 1080 - 1096
  • [23] Enabling Conversational Interaction with Mobile UI using Large Language Models
    Wang, Bryan
    Li, Gang
    Li, Yang
    PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023, 2023,
  • [24] Enhancing Relational Triple Extraction in Specific Domains: Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models
    Li, Jiakai
    Hu, Jianpeng
    Zhang, Geng
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 2481 - 2503
  • [25] ChatGeoAI: Enabling Geospatial Analysis for Public through Natural Language, with Large Language Models
    Mansourian, Ali
    Oucheikh, Rachid
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (10)
  • [26] Future Prospects of Large Language Models: Enabling Natural Language Processing in Educational Robotics
    Vinoth Kumar, S.
    Saroo Raj, R.B.
    Praveenchandar, J.
    Vidhya, S.
    Karthick, S.
    Madhubala, R.
    International Journal of Interactive Mobile Technologies, 2024, 18 (23) : 85 - 97
  • [27] Efficient Detection of Toxic Prompts in Large Language Models
    Liu, Yi
    Yu, Junzhe
    Sun, Huijia
    Shi, Ling
    Deng, Gelei
    Chen, Yuqi
    Liu, Yang
    arXiv, 1600,
  • [28] Dynamic Voting for Efficient Reasoning in Large Language Models
    Xue, Mingfeng
    Liu, Dayiheng
    Lei, Wenqiang
    Ren, Xingzhang
    Yang, Baosong
    Xie, Jun
    Zhang, Yidan
    Peng, Dezhong
    Lv, Jiancheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3085 - 3104
  • [29] Energy-efficient edge deployment of generative AI models using federated learning
    Shadi Alzu’bi
    Tarek Kanan
    Mohammed Elbes
    Ghassan Kanaan
    Issam Trrad
    Cluster Computing, 2025, 28 (5)
  • [30] Energy Efficient Radio Access Architectures for Green Radio: Large versus Small Cell Size Deployment
    Badic, B.
    O'Farrell, T.
    Loskot, P.
    He, J.
    2009 IEEE 70TH VEHICULAR TECHNOLOGY CONFERENCE FALL, VOLS 1-4, 2009, : 80 - 84