Context Compression and Extraction: Efficiency Inference of Large Language Models

被引：0

作者：

Zhou, Junyao ^{[1
]}

Du, Ruiqing ^{[1
]}

Tan, Yushan ^{[2
]}

Yang, Jintao ^{[2
]}

Yang, Zonghao ^{[2
]}

Luo, Wei ^{[2
]}

Luo, Zhunchen ^{[2
]}

Zhou, Xian ^{[2
]}

Hu, Wenpeng ^{[2
]}

机构：

[1] Hebei Univ Engn, Sch Informat & Elect Engn, Handan 056000, Peoples R China

[2] Acad Mil Sci Peoples Liberat Army, Beijing 1000000, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024 | 2024年 / 14875卷

基金：

中国国家自然科学基金;

关键词：

self-information; mutual-information; context compression; large language model;

D O I：

10.1007/978-981-97-5663-6_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models have shown great capability in dealing with long contexts. However, when applied to question-and-answer response tasks, excessively long contexts unavoidably contain redundant information, which could potentially lead to a loss of significant details. Therefore it is a challenge to retain the information related to the user's query intent in long contexts. To address this problem, our study proposes a novel Context Compression and Extraction (CCE) technique, which takes the impact of the user query into account. CCE computes the mutual information between the query and its context, integrating this with self-information to preserve query-relevant information in the compressed context. We have validated our approach across diverse datasets that require integrated context processing capabilities, such as the arXiv paper dataset and news article dataset. Our methodology exhibits efficacy in various tasks, including summarization, question-answering, and the reconstruction of original contexts. Experimental results validate the superior performance of our method compared to a strong baseline across several evaluation metrics, significantly enhancing the quality of text generated in downstream tasks.

引用

下载

页码：221 / 232

页数：12

共 50 条

[11] Exploring Synergies between Causal Models and Large Language Models for Enhanced Understanding and Inference
Sun, Yaru
Yang, Ying
Fu, Wenhao
2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
[12] ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
Fu, Yao
Xue, Leyang
Huang, Yeqi
Brabete, Andrei-Octavian
Ustiugov, Dmitrii
Patel, Yuvraj
Mai, Luo
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 135 - 153
[13] Exploring the applicability of large language models to citation context analysis
Nishikawa, Kai
Koshiba, Hitoshi
SCIENTOMETRICS, 2024, : 6751 - 6777
[14] Context is everything in regulatory application of large language models (LLMs)
Tong, Weida
Renaudin, Michael
DRUG DISCOVERY TODAY, 2024, 29 (04)
[15] Adaptive In-Context Learning with Large Language Models for Bundle
Sun, Zhu
Feng, Kaidong
Yang, Jie
Qu, Xinghua
Fang, Hui
Ong, Yew-Soon
Liu, Wenyuan
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 966 - 976
[16] LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models
Feng, Xiaoning
Han, Xiaohong
Chen, Simin
Yang, Wei
ACM Transactions on Software Engineering and Methodology, 2024, 33 (07)
[17] Implications of Large Language Models for Quality and Efficiency of Neurologic Care
Moura, Lidia
Jones, David T.
Sheikh, Irfan S.
Murphy, Shawn
Kalfin, Michael
Kummer, Benjamin R.
Weathers, Allison L.
Grinspan, Zachary M.
Silsbee, Heather M.
Jones Jr, Lyell K.
Patel, Anup D.
NEUROLOGY, 2024, 102 (11) : e209497
[18] EchoSwift An Inference Benchmarking and Configuration Discovery Tool for Large Language Models (LLMs)
Krishna, Karthik
Bandili, Ramana
COMPANION OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE COMPANION 2024, 2024, : 158 - 162
[19] Beyond the Cloud: Edge Inference for Generative Large Language Models in Wireless Networks
Zhang, Xinyuan
Nie, Jiangtian
Huang, Yudong
Xie, Gaochang
Xiong, Zehui
Liu, Jiang
Niyato, Dusit
Shen, Xuemin
IEEE Transactions on Wireless Communications, 2025, 24 (01) : 643 - 658
[20] Tabi: An Efficient Multi-Level Inference System for Large Language Models
Wang, Yiding
Chen, Kai
Tan, Haisheng
Guo, Kun
PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 233 - 248

← 1 2 3 4 5 →