Context Compression and Extraction: Efficiency Inference of Large Language Models

被引：0

作者：

Zhou, Junyao ^{[1
]}

Du, Ruiqing ^{[1
]}

Tan, Yushan ^{[2
]}

Yang, Jintao ^{[2
]}

Yang, Zonghao ^{[2
]}

Luo, Wei ^{[2
]}

Luo, Zhunchen ^{[2
]}

Zhou, Xian ^{[2
]}

Hu, Wenpeng ^{[2
]}

机构：

[1] Hebei Univ Engn, Sch Informat & Elect Engn, Handan 056000, Peoples R China

[2] Acad Mil Sci Peoples Liberat Army, Beijing 1000000, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024 | 2024年 / 14875卷

基金：

中国国家自然科学基金;

关键词：

self-information; mutual-information; context compression; large language model;

D O I：

10.1007/978-981-97-5663-6_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models have shown great capability in dealing with long contexts. However, when applied to question-and-answer response tasks, excessively long contexts unavoidably contain redundant information, which could potentially lead to a loss of significant details. Therefore it is a challenge to retain the information related to the user's query intent in long contexts. To address this problem, our study proposes a novel Context Compression and Extraction (CCE) technique, which takes the impact of the user query into account. CCE computes the mutual information between the query and its context, integrating this with self-information to preserve query-relevant information in the compressed context. We have validated our approach across diverse datasets that require integrated context processing capabilities, such as the arXiv paper dataset and news article dataset. Our methodology exhibits efficacy in various tasks, including summarization, question-answering, and the reconstruction of original contexts. Experimental results validate the superior performance of our method compared to a strong baseline across several evaluation metrics, significantly enhancing the quality of text generated in downstream tasks.

引用

下载

页码：221 / 232

页数：12

共 50 条

[31] Large Language Models for Simultaneous Named Entity Extraction and Spelling Correction
Best Path Research Inc., Tokyo, Japan
不详
arXiv,
[32] Enhancing Relation Extraction from Biomedical Texts by Large Language Models
Asada, Masaki
Fukuda, Ken
ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 3 - 14
[33] Exploring Large Language Models for Low-Resource IT Information Extraction
Bhavya, Bhavya
Isaza, Paulina Toro
Deng, Yu
Nidd, Michael
Azad, Amar Prakash
Shwartz, Larisa
Zhai, ChengXiang
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1203 - 1212
[34] Structured information extraction from scientific text with large language models
Dagdelen, John
Dunn, Alexander
Lee, Sanghoon
Walker, Nicholas
Rosen, Andrew S.
Ceder, Gerbrand
Persson, Kristin A.
Jain, Anubhav
NATURE COMMUNICATIONS, 2024, 15 (01)
[35] Comprehensive testing of large language models for extraction of structured data in pathology
Bastian Grothey
Jan Odenkirchen
Adnan Brkic
Birgid Schömig-Markiefka
Alexander Quaas
Reinhard Büttner
Yuri Tolkach
Communications Medicine, 5 (1):
[36] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
Ganiev, Amir
Chapin, Colt
de Andrade, Anderson
Liu, Chen
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
[37] InferDPT: Privacy-preserving Inference for Black-box Large Language Models
Tong, Meng
Chen, Kejiang
Zhang, Jie
Qi, Yuang
Zhang, Weiming
Yu, Nenghai
Zhang, Tianwei
Zhang, Zhikun
arXiv, 2023,
[38] Active inference goes to school: the importance of active learning in the age of large language models
Di Paolo, Laura Desiree
White, Ben
Guenin-Carlut, Avel
Constant, Axel
Clark, Andy
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2024, 379 (1911)
[39] InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Lee, Wonbeom
Lee, Jungi
Seo, Junghwan
Sim, Jaewoong
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 155 - 172
[40] Automatic inference of models for statistical code compression
Fraser, Christopher W.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999, : 242 - 246

← 1 2 3 4 5 →