Combining Self-supervised Learning and Active Learning for Disfluency Detection

被引:4
|
作者
Wang, Shaolei [1 ]
Wang, Zhongyuan [1 ]
Che, Wanxiang [1 ]
Zhao, Sendong [1 ]
Liu, Ting [1 ]
机构
[1] Harbin Inst Technol, 2 YiKuang St,Tech & Innovat Bldg,HIT Sci Pk, Harbin 150001, Heilongjiang, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Disfluency detection; self-supervised learning; active learning; pre-training technology;
D O I
10.1145/3487290
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words fromunlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Quantum self-supervised learning
    Jaderberg, B.
    Anderson, L. W.
    Xie, W.
    Albanie, S.
    Kiffner, M.
    Jaksch, D.
    QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
  • [22] Self-Supervised Learning for Recommendation
    Huang, Chao
    Xia, Lianghao
    Wang, Xiang
    He, Xiangnan
    Yin, Dawei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 5136 - 5139
  • [23] Credal Self-Supervised Learning
    Lienen, Julian
    Huellermeier, Eyke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Self-Supervised Learning for Electroencephalography
    Rafiei, Mohammad H.
    Gauthier, Lynne V.
    Adeli, Hojjat
    Takabi, Daniel
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1457 - 1471
  • [25] Self-Supervised Self-Supervision by Combining Deep Learning and Probabilistic Logic
    Lang, Hunter
    Poon, Hoifung
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4978 - 4986
  • [26] Combining Self-Supervised Learning and Yolo v4 Network for Construction Vehicle Detection
    Zhang, Ying
    Hou, Xuyang
    Hou, Xuhang
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [27] Reducing Label Effort: Self-Supervised meets Active Learning
    Bengar, Javad Zolfaghari
    van de Weijer, Joost
    Twardowski, Bartlomiej
    Raducanu, Bogdan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1631 - 1639
  • [28] Self-Supervised Image Quality Assessment through Active Learning
    Yu, Yunchao
    Sang, Qingbing
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 315 - 319
  • [29] A New Self-supervised Method for Supervised Learning
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Liu, Ming
    INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155
  • [30] Generative and Contrastive Self-Supervised Learning for Graph Anomaly Detection
    Zheng, Yu
    Jin, Ming
    Liu, Yixin
    Chi, Lianhua
    Phan, Khoa T.
    Chen, Yi-Ping Phoebe
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12220 - 12233