Compressing Context to Enhance Inference Efficiency of Large Language Models

被引：0

作者：

Li, Yucheng ^{[1
]}

Dong, Bo ^{[1
]}

Guerin, Frank ^{[1
]}

Lin, Chenghua ^{[2
,3
]}

机构：

[1] Univ Surrey, Dept Comp Sci, Guildford, Surrey, England

[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England

[3] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. Code and data are available at https://github.com/liyucheng09/Selective_Context.

引用

页码：6342 / 6353

页数：12

共 50 条

[41] Context-Aware Abbreviation Expansion Using Large Language Models
Cai, Shanqing
Venugopalan, Subhashini
Tomanek, Katrin
Narayanan, Ajit
Morris, Meredith Ringel
Brenner, Michael P.
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1261 - 1275
[42] Are Emergent Abilities in Large Language Models just In-Context Learning?
Lu, Sheng
Bigoulaeva, Irina
Sachdeva, Rachneet
Madabushi, Harish Tayyar
Gurevych, Iryna
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5098 - 5139
[43] Towards a benchmark dataset for large language models in the context of process automation
Tizaoui, Tejennour
Tan, Ruomu
DIGITAL CHEMICAL ENGINEERING, 2024, 13
[44] Extending Context Window of Large Language Models via Semantic Compression
Fei, Weizhi
Niu, Xueyan
Zhou, Pingyi
Hou, Lu
Bai, Bo
Deng, Lei
Han, Wei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5169 - 5181
[45] In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Salewski, Leonard
Alaniz, Stephan
Rio-Torto, Isabel
Schulz, Eric
Akata, Zeynep
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[46] Visual In-Context Learning for Large Vision-Language Models
Zhou, Yucheng
Le, Xiang
Wang, Qianning
Shen, Jianbing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
[47] Active Learning Principles for In-Context Learning with Large Language Models
Margatina, Katerina
Schick, Timo
Aletras, Nikolaos
Dwivedi-Yu, Jane
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5011 - 5034
[48] Context Is King: Large Language Models' Interpretability in Divergent Knowledge Scenarios
Pineiro-Martin, Andres
Santos-Criado, Francisco-Javier
Garcia-Mateo, Carmen
Docio-Fernandez, Laura
Lopez-Perez, Maria del Carmen
APPLIED SCIENCES-BASEL, 2025, 15 (03):
[49] Compressing Pre-trained Language Models by Matrix Decomposition
Ben Noach, Matan
Goldberg, Yoav
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 884 - 889
[50] Applying Large Language Models to Enhance the Assessment of Parallel Functional Programming Assignments
Grandel, Skyler
Schmidt, Douglas C.
Leach, Kevin
2024 INTERNATIONAL WORKSHOP ON LARGE LANGUAGE MODELS FOR CODE, LLM4CODE 2024, 2024, : 102 - 110

← 1 2 3 4 5 →