Compressing Context to Enhance Inference Efficiency of Large Language Models

被引:0
|
作者
Li, Yucheng [1 ]
Dong, Bo [1 ]
Guerin, Frank [1 ]
Lin, Chenghua [2 ,3 ]
机构
[1] Univ Surrey, Dept Comp Sci, Guildford, Surrey, England
[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England
[3] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. Code and data are available at https://github.com/liyucheng09/Selective_Context.
引用
收藏
页码:6342 / 6353
页数:12
相关论文
共 50 条
  • [1] Context Compression and Extraction: Efficiency Inference of Large Language Models
    Zhou, Junyao
    Du, Ruiqing
    Tan, Yushan
    Yang, Jintao
    Yang, Zonghao
    Luo, Wei
    Luo, Zhunchen
    Zhou, Xian
    Hu, Wenpeng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 221 - 232
  • [2] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
    Jiang, Huiqiang
    Wu, Qianhui
    Lin, Chin-Yew
    Yang, Yuqing
    Qiu, Lili
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13358 - 13376
  • [3] Measuring and Improving the Energy Efficiency of Large Language Models Inference
    Argerich, Mauricio Fadel
    Patino-Martinez, Marta
    IEEE ACCESS, 2024, 12 : 80194 - 80207
  • [4] Language Models for Lexical Inference in Context
    Schmitt, Martin
    Schuetze, Hinrich
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1267 - 1280
  • [5] Inference to the Best Explanation in Large Language Models
    Dalal, Dhairya
    Valentino, Marco
    Freitas, Andre
    Buitelaar, Paul
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 217 - 235
  • [6] Assessing Inference Time in Large Language Models
    Walkowiak, Bartosz
    Walkowiak, Tomasz
    SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 296 - 305
  • [7] Compressing large polygonal models
    Ho, J
    Lee, KC
    Kriegman, D
    VISUALIZATION 2001, PROCEEDINGS, 2001, : 357 - 362
  • [8] Sources of Hallucination by Large Language Models on Inference Tasks
    McKenna, Nick
    Li, Tianyi
    Cheng, Liang
    Hosseini, Mohammad Javad
    Johnson, Mark
    Steedman, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2758 - 2774
  • [9] Compressing Huffman Models on Large Alphabets
    Navarro, Gonzalo
    Ordonez, Alberto
    2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 381 - 390
  • [10] Improving Causal Inference of Large Language Models with SCM Tools
    Hua, Zhenyang
    Xing, Shuyue
    Jiang, Huixing
    Wei, Chen
    Wang, Xiaojie
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 3 - 14