Select, Prompt, Filter: Distilling Large Language Models for Summarizing Conversations

被引:0
|
作者
Pham, Minh-Quang [1 ]
Indurthi, Sathish Reddy [1 ]
Chollampatt, Shamil [1 ]
Turchi, Marco [1 ]
机构
[1] Zoom Video Commun, San Jose, CA 95113 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) like ChatGPT can be expensive to train, deploy, and use for specific natural language generation tasks such as text summarization and for certain domains. A promising alternative is to fine-tune relatively smaller language models (LMs) on a particular task using high-quality, in-domain datasets. However, it can be prohibitively expensive to get such high-quality training data. This issue has been mitigated by generating weakly supervised data via knowledge distillation (KD) of LLMs. We propose a three-step approach to distill ChatGPT and fine-tune smaller LMs for summarizing forum conversations. More specifically, we design a method to selectively sample a large unannotated corpus of forum conversation using a semantic similarity metric. Then, we use the same metric to retrieve suitable prompts for ChatGPT from a small annotated validation set in the same domain. The generated dataset is then filtered to remove low-quality instances. Our proposed select-prompt-filter KD approach leads to significant improvements of up to 6.6 ROUGE-2 score by leveraging sufficient in-domain pseudo-labelled data, over a standard KD approach given the same size of training data.
引用
收藏
页码:12257 / 12265
页数:9
相关论文
共 50 条
  • [41] Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
    Reynolds, Laria
    McDonell, Kyle
    EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,
  • [42] A Study on Prompt Types for Harmlessness Assessment of Large-Scale Language Models
    Shin, Yejin
    Kim, Song-yi
    Byun, Eun Young
    HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 : 228 - 233
  • [43] Distilling mathematical reasoning capabilities into Small Language Models
    Zhu, Xunyu
    Li, Jian
    Liu, Yong
    Ma, Can
    Wang, Weiping
    NEURAL NETWORKS, 2024, 179
  • [44] Exploring the opportunities of large language models for summarizing palliative care consultations: A pilot comparative study
    Chen, Xiao
    Zhou, Wei
    Hoda, Rashina
    Li, Andy
    Bain, Chris
    Poon, Peter
    DIGITAL HEALTH, 2024, 10
  • [45] Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: Benchmark Study
    Adhikary, Prottay Kumar
    Srivastav, Aseem
    Kumar, Shivani
    Singh, Salam Michael
    Manuj, Puneet
    Gopinath, Jini K.
    Krishnan, Vijay
    Gupta, Swati Kedia
    Deb, Koushik Sinha
    Chakraborty, Tanmoy
    JMIR MENTAL HEALTH, 2024, 11
  • [46] AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
    Li, Siheng
    Yang, Cheng
    Yin, Yichun
    Zhu, Xinyu
    Cheng, Zesen
    Shang, Lifeng
    Jiang, Xin
    Liu, Qun
    Yang, Yujiu
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1751 - 1762
  • [47] Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
    Xu, Dongkuan
    Mukherjee, Subhabrata
    Liu, Xiaodong
    Dey, Debadeepta
    Wang, Wenhui
    Zhang, Xiang
    Awadallah, Ahmed Hassan
    Gao, Jianfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Dialogue models for conversations in Slovak language
    Ondas, S.
    Juhar, J.
    2016 INTERNATIONAL CONFERENCE ON EMERGING ELEARNING TECHNOLOGIES AND APPLICATIONS (ICETA), 2016,
  • [49] Distilling Monolingual Models from Large Multilingual Transformers
    Singh, Pranaydeep
    De Clercq, Orphee
    Lefever, Els
    ELECTRONICS, 2023, 12 (04)
  • [50] Co-training Improves Prompt-based Learning for Large Language Models
    Lang, Hunter
    Agrawal, Monica
    Kim, Yoon
    Sontag, David
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,