LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

被引：0

作者：

Pan, Zhuoshi ^{[1
]}

Wu, Qianhui ^{[2
]}

Jiang, Huiqiang ^{[2
]}

Xia, Menglin ^{[2
]}

Luo, Xufang ^{[2
]}

Zhang, Jue ^{[2
]}

Lin, Qingwei ^{[2
]}

Ruhle, Victor ^{[2
]}

Yang, Yuqing ^{[2
]}

Lin, Chin-Yew ^{[2
]}

Zhao, H. Vicky ^{[1
]}

Qiu, Lili ^{[2
]}

Zhang, Dongmei ^{[2
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Microsoft Corp, Redmond, WA 98052 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. Considering the redundancy in natural language, existing approaches compress prompts by removing tokens or lexical units according to their information entropy obtained from a causal language model such as LLaMa-7B. The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective. To address these issues, we propose a data distillation procedure to derive knowledge from an LLM to compress prompts without losing crucial information, and meantime, introduce an extractive text compression dataset. We formulate prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one, and use a Transformer encoder as the base architecture to capture all essential information for prompt compression from the full bidirectional context. Our approach leads to lower latency by explicitly learning the compression objective with smaller models such as XLM-RoBERTa-large and mBERT. We evaluate our method on both in-domain and out-of-domain datasets, including Meeting-Bank, LongBench, ZeroScrolls, GSM8K, and BBH. Despite its small size, our model shows significant performance gains over strong baselines and demonstrates robust generalization ability across different LLMs. Additionally, our model is 3x-6x faster than existing prompt compression methods, while accelerating the end-to-end latency by 1.6x-2.9x with compression ratios of 2x-5x.1

引用

页码：963 / 981

页数：19

共 28 条

[1] Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
Chen, Cheng
Yin, Yichun
Shang, Lifeng
Wang, Zhi
Jiang, Xin
Chen, Xiao
Liu, Qun
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 570 - 581
[2] Fundamentals of Task-Agnostic Data Valuation
Amiri, Mohammad Mohammadi
Berdoz, Frederic
Raskar, Ramesh
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9226 - 9234
[3] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Ye, Seonghyeon
Hwang, Hyeonbin
Yang, Sohee
Yun, Hyeongu
Kim, Yireun
Seo, Minjoon
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19386 - 19394
[4] Continual deep reinforcement learning with task-agnostic policy distillation
Hafez, Muhammad Burhan
Erekmen, Kerim
SCIENTIFIC REPORTS, 2024, 14 (01):
[5] Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation
Hafez, Muhammad Burhan
Erekmen, Kerim
arXiv,
[6] MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wang, Wenhui
Wei, Furu
Dong, Li
Bao, Hangbo
Yang, Nan
Zhou, Ming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[7] TADA: Efficient Task-Agnostic Domain Adaptation for Transformers
Hung, Chia-Chien
Lange, Lukas
Stroetgen, Jannik
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 487 - 503
[8] Improving task-agnostic BERT distillation with layer mapping search q
Jiao, Xiaoqi
Chang, Huating
Yin, Yichun
Shang, Lifeng
Jiang, Xin
Chen, Xiao
Li, Linlin
Wang, Fang
Liu, Qun
NEUROCOMPUTING, 2021, 461 : 194 - 203
[9] Towards a Task-agnostic Distillation Methodology for Creating Edge Foundation Models
Dey, Swarnava
Mukherjee, Arijit
Ukil, Arijit
Pal, Arpan
PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 10 - 15
[10] Task-Agnostic Self-Distillation for Few-Shot Action Recognition
Bin Zhang
Dan, Yuanjie
Chen, Peng
Li, Ronghua
Gao, Nan
Hum, Ruohong
He, Xiaofei
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5425 - 5433

← 1 2 3 →