Context Compression and Extraction: Efficiency Inference of Large Language Models

被引：0

作者：

Zhou, Junyao ^{[1
]}

Du, Ruiqing ^{[1
]}

Tan, Yushan ^{[2
]}

Yang, Jintao ^{[2
]}

Yang, Zonghao ^{[2
]}

Luo, Wei ^{[2
]}

Luo, Zhunchen ^{[2
]}

Zhou, Xian ^{[2
]}

Hu, Wenpeng ^{[2
]}

机构：

[1] Hebei Univ Engn, Sch Informat & Elect Engn, Handan 056000, Peoples R China

[2] Acad Mil Sci Peoples Liberat Army, Beijing 1000000, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024 | 2024年 / 14875卷

基金：

中国国家自然科学基金;

关键词：

self-information; mutual-information; context compression; large language model;

D O I：

10.1007/978-981-97-5663-6_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models have shown great capability in dealing with long contexts. However, when applied to question-and-answer response tasks, excessively long contexts unavoidably contain redundant information, which could potentially lead to a loss of significant details. Therefore it is a challenge to retain the information related to the user's query intent in long contexts. To address this problem, our study proposes a novel Context Compression and Extraction (CCE) technique, which takes the impact of the user query into account. CCE computes the mutual information between the query and its context, integrating this with self-information to preserve query-relevant information in the compressed context. We have validated our approach across diverse datasets that require integrated context processing capabilities, such as the arXiv paper dataset and news article dataset. Our methodology exhibits efficacy in various tasks, including summarization, question-answering, and the reconstruction of original contexts. Experimental results validate the superior performance of our method compared to a strong baseline across several evaluation metrics, significantly enhancing the quality of text generated in downstream tasks.

引用

下载

页码：221 / 232

页数：12

共 50 条

[41] Membership inference attacks against compression models
Yong Jin
Weidong Lou
Yanghua Gao
Computing, 2023, 105 : 2419 - 2442
[42] Membership inference attacks against compression models
Jin, Yong
Lou, Weidong
Gao, Yanghua
COMPUTING, 2023, 105 (11) : 2419 - 2442
[43] Automatic inference of models for statistical code compression
Fraser, CW
ACM SIGPLAN NOTICES, 1999, 34 (05) : 242 - 246
[44] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Computing: An Active Inference Approach
He Y.
Fang J.
Yu F.R.
Leung V.C.
IEEE Transactions on Mobile Computing, 2024, 23 (12) : 1 - 12
[45] Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction
He, Wentao
Ma, Hanjie
Li, Shaohua
Dong, Hui
Zhang, Haixiang
Feng, Jie
APPLIED SCIENCES-BASEL, 2023, 13 (22):
[46] A Generative Adaptive Context Learning Framework for Large Language Models in Cheapfake Detection
Pham, Long-Khanh
Vo-Hoang, Hoa-Vien
Tran, Anh-Duy
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1288 - 1293
[47] Large Language Models (LLMs) Inference Offloading and Resource Allocation in Cloud-Edge Networks: An Active Inference Approach
Fang, Jingcheng
He, Ying
Yu, F. Richard
Li, Jianqiang
Leung, Victor C.
2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
[48] Large Language Models are Not Models of Natural Language: They are Corpus Models
Veres, Csaba
IEEE ACCESS, 2022, 10 : 61970 - 61979
[49] Large Language Models
Vargas, Diego Collarana
Katsamanis, Nassos
ERCIM NEWS, 2024, (136): : 12 - 13
[50] Large Language Models
Cerf, Vinton G.
COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 7 - 7

← 1 2 3 4 5 →