Compressing Context to Enhance Inference Efficiency of Large Language Models

被引：0

作者：

Li, Yucheng ^{[1
]}

Dong, Bo ^{[1
]}

Guerin, Frank ^{[1
]}

Lin, Chenghua ^{[2
,3
]}

机构：

[1] Univ Surrey, Dept Comp Sci, Guildford, Surrey, England

[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England

[3] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50% reduction in context cost, resulting in a 36% reduction in inference memory usage and a 32% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. Code and data are available at https://github.com/liyucheng09/Selective_Context.

引用

页码：6342 / 6353

页数：12

共 50 条

[21] Integrating Knowledge Graph Data with Large Language Models for Explainable Inference
Efrain Quintero-Narvaez, Carlos
Monroy, Raul
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 1198 - 1199
[22] Exploring Synergies between Causal Models and Large Language Models for Enhanced Understanding and Inference
Sun, Yaru
Yang, Ying
Fu, Wenhao
2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
[23] Does Metacognitive Prompting Improve Causal Inference in Large Language Models?
Ohtani, Ryusei
Sakurai, Yuko
Oyama, Satoshi
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 458 - 459
[24] ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
Fu, Yao
Xue, Leyang
Huang, Yeqi
Brabete, Andrei-Octavian
Ustiugov, Dmitrii
Patel, Yuvraj
Mai, Luo
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 135 - 153
[25] Compressing Neural Language Models by Sparse Word Representations
Chen, Yunchuan
Mou, Lili
Xu, Yan
Li, Ge
Jin, Zhi
PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 226 - 235
[26] Exploring the applicability of large language models to citation context analysis
Nishikawa, Kai
Koshiba, Hitoshi
SCIENTOMETRICS, 2024, 129 (11) : 6751 - 6777
[27] Context is everything in regulatory application of large language models (LLMs)
Tong, Weida
Renaudin, Michael
DRUG DISCOVERY TODAY, 2024, 29 (04)
[28] Adaptive In-Context Learning with Large Language Models for Bundle
Sun, Zhu
Feng, Kaidong
Yang, Jie
Qu, Xinghua
Fang, Hui
Ong, Yew-Soon
Liu, Wenyuan
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
[29] Learning to Retrieve In-Context Examples for Large Language Models
Wang, Liang
Yang, Nan
Wei, Furu
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1752 - 1767
[30] LLMEffiChecker: : Understanding and Testing Efficiency Degradation of Large Language Models
Feng, Xiaoning
Han, Xiaohong
Chen, Simin
Yang, Wei
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (07)

← 1 2 3 4 5 →