Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training

被引：0

作者：

Zhang, Haode ^{[1
]}

Liang, Haowen ^{[1
]}

Zh, Liming ^{[1
]}

Lam, Albert Y. S. ^{[2
]}

Wu, Xiao-Ming ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

[2] Fano Labs, Hong Kong, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We consider the task of few-shot intent detection, which involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data. The current approach to address this problem is through continual pre-training, i.e., fine-tuning pre-trained language models (PLMs) on external resources (e.g., conversational corpora, public intent detection datasets, or natural language understanding datasets) before using them as utterance encoders for training an intent classifier. In this paper, we show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected. Specifically, we find that directly fine-tuning PLMs on only a handful of labeled examples already yields decent results compared to methods that employ continual pre-training, and the performance gap diminishes rapidly as the number of labeled data increases. To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance. Comprehensive experiments on real-world benchmarks show that given only two or more labeled samples per class, direct fine-tuning outperforms many strong baselines that utilize external data sources for continual pre-training. The code can be found at https://github.com/hdzhang-code/DFTPlus.

引用

页码：11105 / 11119

页数：15

共 50 条

[1] Few-Shot Intent Detection via Contrastive Pre-Training and Fine-Tuning
Zhang, Jian-Guo
Bui, Trung
Yoon, Seunghyun
Chen, Xiang
Liu, Zhiwei
Xia, Congying
Tran, Quan Hung
Chang, Walter
Yu, Philip
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1906 - 1912
[2] Effectiveness of Pre-training for Few-shot Intent Classification
Zhang, Haode
Zhang, Yuwei
Zhan, Li-Ming
Chen, Jiaxin
Shi, Guangyuan
Wu, Xiao-Ming
Lam, Albert Y. S.
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1114 - 1120
[3] Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pre-training and Isotropization
Zhang, Haode
Liang, Haowen
Zhang, Yuwei
Zhan, Liming
Wu, Xiao-Ming
Lu, Xiaolei
Lam, Albert Y. S.
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 532 - 542
[4] Improving Pre-Training and Fine-Tuning for Few-Shot SAR Automatic Target Recognition
Zhang, Chao
Dong, Hongbin
Deng, Baosong
REMOTE SENSING, 2023, 15 (06)
[5] Pre-training Intent-Aware Encoders for Zero- and Few-Shot Intent Classification
Sung, Mujeen
Gun, James
Mansimov, Elman
Pappas, Nikolaos
Shu, Raphael
Romeo, Salvatore
Zhang, Yi
Castelli, Vittorio
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10433 - 10442
[6] Hybrid Fine-Tuning Strategy for Few-Shot Classification
Zhao, Lei
Ou, Zhonghua
Zhang, Lixun
Li, Shuxiao
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[7] Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation
Mosbach, Marius
Pimentel, Tiago
Ravfogel, Shauli
Klakow, Dietrich
Elazar, Yanai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12284 - 12314
[8] Fine-Tuning for Few-Shot Image Classification by Multimodal Prototype Regularization
Wu, Qianhao
Qi, Jiaxin
Zhang, Dong
Zhang, Hanwang
Tang, Jinhui
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8543 - 8556
[9] Label Semantic Aware Pre-training for Few-shot Text Classification
Mueller, Aaron
Krone, Jason
Romeo, Salvatore
Mansour, Saab
Mansimov, Elman
Zhang, Yi
Roth, Dan
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8318 - 8334
[10] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
Chen, Hanjie
Zheng, Guoqing
Awadallah, Ahmed Hassan
Ji, Yangfeng
PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153

← 1 2 3 4 5 →