Compressing Context to Enhance Inference Efficiency of Large Language Models

被引:0
|
作者
Li, Yucheng [1 ]
Dong, Bo [1 ]
Guerin, Frank [1 ]
Lin, Chenghua [2 ,3 ]
机构
[1] Univ Surrey, Dept Comp Sci, Guildford, Surrey, England
[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England
[3] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
来源
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. Code and data are available at https://github.com/liyucheng09/Selective_Context.
引用
收藏
页码:6342 / 6353
页数:12
相关论文
共 50 条
  • [41] Context-Aware Abbreviation Expansion Using Large Language Models
    Cai, Shanqing
    Venugopalan, Subhashini
    Tomanek, Katrin
    Narayanan, Ajit
    Morris, Meredith Ringel
    Brenner, Michael P.
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1261 - 1275
  • [42] Are Emergent Abilities in Large Language Models just In-Context Learning?
    Lu, Sheng
    Bigoulaeva, Irina
    Sachdeva, Rachneet
    Madabushi, Harish Tayyar
    Gurevych, Iryna
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5098 - 5139
  • [43] Towards a benchmark dataset for large language models in the context of process automation
    Tizaoui, Tejennour
    Tan, Ruomu
    DIGITAL CHEMICAL ENGINEERING, 2024, 13
  • [44] Extending Context Window of Large Language Models via Semantic Compression
    Fei, Weizhi
    Niu, Xueyan
    Zhou, Pingyi
    Hou, Lu
    Bai, Bo
    Deng, Lei
    Han, Wei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5169 - 5181
  • [45] In-Context Impersonation Reveals Large Language Models' Strengths and Biases
    Salewski, Leonard
    Alaniz, Stephan
    Rio-Torto, Isabel
    Schulz, Eric
    Akata, Zeynep
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
  • [47] Active Learning Principles for In-Context Learning with Large Language Models
    Margatina, Katerina
    Schick, Timo
    Aletras, Nikolaos
    Dwivedi-Yu, Jane
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5011 - 5034
  • [48] Context Is King: Large Language Models' Interpretability in Divergent Knowledge Scenarios
    Pineiro-Martin, Andres
    Santos-Criado, Francisco-Javier
    Garcia-Mateo, Carmen
    Docio-Fernandez, Laura
    Lopez-Perez, Maria del Carmen
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [49] Compressing Pre-trained Language Models by Matrix Decomposition
    Ben Noach, Matan
    Goldberg, Yoav
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 884 - 889
  • [50] Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming Assignments
    Grandel, Skyler
    Schmidt, Douglas C.
    Leach, Kevin
    2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 102 - 110