Compressing Context to Enhance Inference Efficiency of Large Language Models

被引:0
|
作者
Li, Yucheng [1 ]
Dong, Bo [1 ]
Guerin, Frank [1 ]
Lin, Chenghua [2 ,3 ]
机构
[1] Univ Surrey, Dept Comp Sci, Guildford, Surrey, England
[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England
[3] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. Code and data are available at https://github.com/liyucheng09/Selective_Context.
引用
收藏
页码:6342 / 6353
页数:12
相关论文
共 50 条
  • [21] Integrating Knowledge Graph Data with Large Language Models for Explainable Inference
    Efrain Quintero-Narvaez, Carlos
    Monroy, Raul
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 1198 - 1199
  • [22] Exploring Synergies between Causal Models and Large Language Models for Enhanced Understanding and Inference
    Sun, Yaru
    Yang, Ying
    Fu, Wenhao
    2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
  • [23] Does Metacognitive Prompting Improve Causal Inference in Large Language Models?
    Ohtani, Ryusei
    Sakurai, Yuko
    Oyama, Satoshi
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 458 - 459
  • [24] ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
    Fu, Yao
    Xue, Leyang
    Huang, Yeqi
    Brabete, Andrei-Octavian
    Ustiugov, Dmitrii
    Patel, Yuvraj
    Mai, Luo
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 135 - 153
  • [25] Compressing Neural Language Models by Sparse Word Representations
    Chen, Yunchuan
    Mou, Lili
    Xu, Yan
    Li, Ge
    Jin, Zhi
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 226 - 235
  • [26] Exploring the applicability of large language models to citation context analysis
    Nishikawa, Kai
    Koshiba, Hitoshi
    SCIENTOMETRICS, 2024, 129 (11) : 6751 - 6777
  • [27] Context is everything in regulatory application of large language models (LLMs)
    Tong, Weida
    Renaudin, Michael
    DRUG DISCOVERY TODAY, 2024, 29 (04)
  • [28] Adaptive In-Context Learning with Large Language Models for Bundle
    Sun, Zhu
    Feng, Kaidong
    Yang, Jie
    Qu, Xinghua
    Fang, Hui
    Ong, Yew-Soon
    Liu, Wenyuan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
  • [29] Learning to Retrieve In-Context Examples for Large Language Models
    Wang, Liang
    Yang, Nan
    Wei, Furu
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
  • [30] LLMEffiChecker: : Understanding and Testing Efficiency Degradation of Large Language Models
    Feng, Xiaoning
    Han, Xiaohong
    Chen, Simin
    Yang, Wei
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)