Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training

被引：0

作者：

Zhang, Haode ^{[1
]}

Liang, Haowen ^{[1
]}

Zh, Liming ^{[1
]}

Lam, Albert Y. S. ^{[2
]}

Wu, Xiao-Ming ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

[2] Fano Labs, Hong Kong, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We consider the task of few-shot intent detection, which involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data. The current approach to address this problem is through continual pre-training, i.e., fine-tuning pre-trained language models (PLMs) on external resources (e.g., conversational corpora, public intent detection datasets, or natural language understanding datasets) before using them as utterance encoders for training an intent classifier. In this paper, we show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected. Specifically, we find that directly fine-tuning PLMs on only a handful of labeled examples already yields decent results compared to methods that employ continual pre-training, and the performance gap diminishes rapidly as the number of labeled data increases. To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance. Comprehensive experiments on real-world benchmarks show that given only two or more labeled samples per class, direct fine-tuning outperforms many strong baselines that utilize external data sources for continual pre-training. The code can be found at https://github.com/hdzhang-code/DFTPlus.

引用

页码：11105 / 11119

页数：15

共 50 条

[41] Fine-Tuning of CLIP in Few-Shot Scenarios via Supervised Contrastive Learning
Luo, Jing
Wu, Guangxing
Liu, Hongmei
Wang, Ruixuan
PATTERN RECOGNITION AND COMPUTER VISION, PT III, PRCV 2024, 2025, 15033 : 104 - 117
[42] Omni-Training: Bridging Pre-Training and Meta-Training for Few-Shot Learning
Shu, Yang
Cao, Zhangjie
Gao, Jinghan
Wang, Jianmin
Yu, Philip S.
Long, Mingsheng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15275 - 15291
[43] Tri-Train: Automatic Pre-Fine Tuning between Pre-Training and Fine-Tuning for SciNER
Zeng, Qingkai
Yu, Wenhao
Yu, Mengxia
Jiang, Tianwen
Weninger, Tim
Jiang, Meng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4778 - 4787
[44] Chain of Thought Guided Few-Shot Fine-Tuning of LLMs for Multimodal Aspect-Based Sentiment Classification
Wu, Hao
Yang, Danping
Liu, Peng
Li, Xianxian
MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 182 - 194
[45] Bridging the Gap between Pre-Training and Fine-Tuning for Commonsense Generation
Yang, Haoran
Wang, Yan
Li, Piji
Bi, Wei
Lam, Wai
Xu, Chen
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 376 - 383
[46] On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
Ramanujan, Vivek
Nguyen, Thao
Oh, Sewoong
Schmidt, Ludwig
Farhadi, Ali
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[47] Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning
Shen, Zhiqiang
Liu, Zechun
Qin, Jie
Savvides, Marios
Cheng, Kwang-Ting
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9594 - 9602
[48] Data race detection via few-shot parameter-efficient fine-tuning
Shen, Yuanyuan
Peng, Manman
Zhang, Fan
Wu, Qiang
JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 222
[49] A fine-tuning prototypical network for few-shot cross-domain fault diagnosis
Zhong, Jianhua
Gu, Kairong
Jiang, Haifeng
Liang, Wei
Zhong, Shuncong
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (11)
[50] Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation
Fabbri, Alexander R.
Han, Simeng
Li, Haoyuan
Li, Haoran
Ghazvininejad, Marjan
Joty, Shafiq
Radev, Dragomir
Mehdad, Yashar
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 704 - 717

← 1 2 3 4 5 →