LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

被引:0
|
作者
Pan, Zhuoshi [1 ]
Wu, Qianhui [2 ]
Jiang, Huiqiang [2 ]
Xia, Menglin [2 ]
Luo, Xufang [2 ]
Zhang, Jue [2 ]
Lin, Qingwei [2 ]
Ruhle, Victor [2 ]
Yang, Yuqing [2 ]
Lin, Chin-Yew [2 ]
Zhao, H. Vicky [1 ]
Qiu, Lili [2 ]
Zhang, Dongmei [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Microsoft Corp, Redmond, WA 98052 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on task-agnostic prompt compression for better generalizability and efficiency. Considering the redundancy in natural language, existing approaches compress prompts by removing tokens or lexical units according to their information entropy obtained from a causal language model such as LLaMa-7B. The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective. To address these issues, we propose a data distillation procedure to derive knowledge from an LLM to compress prompts without losing crucial information, and meantime, introduce an extractive text compression dataset. We formulate prompt compression as a token classification problem to guarantee the faithfulness of the compressed prompt to the original one, and use a Transformer encoder as the base architecture to capture all essential information for prompt compression from the full bidirectional context. Our approach leads to lower latency by explicitly learning the compression objective with smaller models such as XLM-RoBERTa-large and mBERT. We evaluate our method on both in-domain and out-of-domain datasets, including Meeting-Bank, LongBench, ZeroScrolls, GSM8K, and BBH. Despite its small size, our model shows significant performance gains over strong baselines and demonstrates robust generalization ability across different LLMs. Additionally, our model is 3x-6x faster than existing prompt compression methods, while accelerating the end-to-end latency by 1.6x-2.9x with compression ratios of 2x-5x.1
引用
收藏
页码:963 / 981
页数:19
相关论文
共 28 条
  • [1] Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
    Chen, Cheng
    Yin, Yichun
    Shang, Lifeng
    Wang, Zhi
    Jiang, Xin
    Chen, Xiao
    Liu, Qun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 570 - 581
  • [2] Fundamentals of Task-Agnostic Data Valuation
    Amiri, Mohammad Mohammadi
    Berdoz, Frederic
    Raskar, Ramesh
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9226 - 9234
  • [3] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
    Ye, Seonghyeon
    Hwang, Hyeonbin
    Yang, Sohee
    Yun, Hyeongu
    Kim, Yireun
    Seo, Minjoon
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19386 - 19394
  • [4] Continual deep reinforcement learning with task-agnostic policy distillation
    Hafez, Muhammad Burhan
    Erekmen, Kerim
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [5] Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation
    Hafez, Muhammad Burhan
    Erekmen, Kerim
    arXiv,
  • [6] MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
    Wang, Wenhui
    Wei, Furu
    Dong, Li
    Bao, Hangbo
    Yang, Nan
    Zhou, Ming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] TADA: Efficient Task-Agnostic Domain Adaptation for Transformers
    Hung, Chia-Chien
    Lange, Lukas
    Stroetgen, Jannik
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 487 - 503
  • [8] Improving task-agnostic BERT distillation with layer mapping search q
    Jiao, Xiaoqi
    Chang, Huating
    Yin, Yichun
    Shang, Lifeng
    Jiang, Xin
    Chen, Xiao
    Li, Linlin
    Wang, Fang
    Liu, Qun
    NEUROCOMPUTING, 2021, 461 : 194 - 203
  • [9] Towards a Task-agnostic Distillation Methodology for Creating Edge Foundation Models
    Dey, Swarnava
    Mukherjee, Arijit
    Ukil, Arijit
    Pal, Arpan
    PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 10 - 15
  • [10] Task-Agnostic Self-Distillation for Few-Shot Action Recognition
    Bin Zhang
    Dan, Yuanjie
    Chen, Peng
    Li, Ronghua
    Gao, Nan
    Hum, Ruohong
    He, Xiaofei
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 5425 - 5433