Select, Prompt, Filter: Distilling Large Language Models for Summarizing Conversations

被引：0

作者：

Pham, Minh-Quang ^{[1
]}

Indurthi, Sathish Reddy ^{[1
]}

Chollampatt, Shamil ^{[1
]}

Turchi, Marco ^{[1
]}

机构：

[1] Zoom Video Commun, San Jose, CA 95113 USA

来源：

2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) like ChatGPT can be expensive to train, deploy, and use for specific natural language generation tasks such as text summarization and for certain domains. A promising alternative is to fine-tune relatively smaller language models (LMs) on a particular task using high-quality, in-domain datasets. However, it can be prohibitively expensive to get such high-quality training data. This issue has been mitigated by generating weakly supervised data via knowledge distillation (KD) of LLMs. We propose a three-step approach to distill ChatGPT and fine-tune smaller LMs for summarizing forum conversations. More specifically, we design a method to selectively sample a large unannotated corpus of forum conversation using a semantic similarity metric. Then, we use the same metric to retrieve suitable prompts for ChatGPT from a small annotated validation set in the same domain. The generated dataset is then filtered to remove low-quality instances. Our proposed select-prompt-filter KD approach leads to significant improvements of up to 6.6 ROUGE-2 score by leveraging sufficient in-domain pseudo-labelled data, over a standard KD approach given the same size of training data.

引用

页码：12257 / 12265

页数：9

共 50 条

[41] Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Reynolds, Laria
McDonell, Kyle
EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,
[42] A Study on Prompt Types for Harmlessness Assessment of Large-Scale Language Models
Shin, Yejin
Kim, Song-yi
Byun, Eun Young
HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 : 228 - 233
[43] Distilling mathematical reasoning capabilities into Small Language Models
Zhu, Xunyu
Li, Jian
Liu, Yong
Ma, Can
Wang, Weiping
NEURAL NETWORKS, 2024, 179
[44] Exploring the opportunities of large language models for summarizing palliative care consultations: A pilot comparative study
Chen, Xiao
Zhou, Wei
Hoda, Rashina
Li, Andy
Bain, Chris
Poon, Peter
DIGITAL HEALTH, 2024, 10
[45] Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: Benchmark Study
Adhikary, Prottay Kumar
Srivastav, Aseem
Kumar, Shivani
Singh, Salam Michael
Manuj, Puneet
Gopinath, Jini K.
Krishnan, Vijay
Gupta, Swati Kedia
Deb, Koushik Sinha
Chakraborty, Tanmoy
JMIR MENTAL HEALTH, 2024, 11
[46] AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models
Li, Siheng
Yang, Cheng
Yin, Yichun
Zhu, Xinyu
Cheng, Zesen
Shang, Lifeng
Jiang, Xin
Liu, Qun
Yang, Yujiu
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1751 - 1762
[47] Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Xu, Dongkuan
Mukherjee, Subhabrata
Liu, Xiaodong
Dey, Debadeepta
Wang, Wenhui
Zhang, Xiang
Awadallah, Ahmed Hassan
Gao, Jianfeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[48] Dialogue models for conversations in Slovak language
Ondas, S.
Juhar, J.
2016 INTERNATIONAL CONFERENCE ON EMERGING ELEARNING TECHNOLOGIES AND APPLICATIONS (ICETA), 2016,
[49] Distilling Monolingual Models from Large Multilingual Transformers
Singh, Pranaydeep
De Clercq, Orphee
Lefever, Els
ELECTRONICS, 2023, 12 (04)
[50] Co-training Improves Prompt-based Learning for Large Language Models
Lang, Hunter
Agrawal, Monica
Kim, Yoon
Sontag, David
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →