Compressing Context to Enhance Inference Efficiency of Large Language Models

被引：0

作者：

Li, Yucheng ^{[1
]}

Dong, Bo ^{[1
]}

Guerin, Frank ^{[1
]}

Lin, Chenghua ^{[2
,3
]}

机构：

[1] Univ Surrey, Dept Comp Sci, Guildford, Surrey, England

[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England

[3] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. Code and data are available at https://github.com/liyucheng09/Selective_Context.

引用

页码：6342 / 6353

页数：12

共 50 条

[1] Context Compression and Extraction: Efficiency Inference of Large Language Models
Zhou, Junyao
Du, Ruiqing
Tan, Yushan
Yang, Jintao
Yang, Zonghao
Luo, Wei
Luo, Zhunchen
Zhou, Xian
Hu, Wenpeng
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 221 - 232
[2] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Jiang, Huiqiang
Wu, Qianhui
Lin, Chin-Yew
Yang, Yuqing
Qiu, Lili
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13358 - 13376
[3] Measuring and Improving the Energy Efficiency of Large Language Models Inference
Argerich, Mauricio Fadel
Patino-Martinez, Marta
IEEE ACCESS, 2024, 12 : 80194 - 80207
[4] Language Models for Lexical Inference in Context
Schmitt, Martin
Schuetze, Hinrich
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1267 - 1280
[5] Inference to the Best Explanation in Large Language Models
Dalal, Dhairya
Valentino, Marco
Freitas, Andre
Buitelaar, Paul
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 217 - 235
[6] Assessing Inference Time in Large Language Models
Walkowiak, Bartosz
Walkowiak, Tomasz
SYSTEM DEPENDABILITY-THEORY AND APPLICATIONS, DEPCOS-RELCOMEX 2024, 2024, 1026 : 296 - 305
[7] Compressing large polygonal models
Ho, J
Lee, KC
Kriegman, D
VISUALIZATION 2001, PROCEEDINGS, 2001, : 357 - 362
[8] Sources of Hallucination by Large Language Models on Inference Tasks
McKenna, Nick
Li, Tianyi
Cheng, Liang
Hosseini, Mohammad Javad
Johnson, Mark
Steedman, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2758 - 2774
[9] Compressing Huffman Models on Large Alphabets
Navarro, Gonzalo
Ordonez, Alberto
2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 381 - 390
[10] Improving Causal Inference of Large Language Models with SCM Tools
Hua, Zhenyang
Xing, Shuyue
Jiang, Huixing
Wei, Chen
Wang, Xiaojie
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 3 - 14

← 1 2 3 4 5 →