Discriminative Self-training for Punctuation Prediction

被引:2
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Chen, Mengzhe [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
来源
关键词
punctuation prediction; self-training; label smoothing; Transformer; BERT;
D O I
10.21437/Interspeech.2021-246
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications. However, achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts, which is expensive and laborious. In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. Experimental results on the English IWSLT2011 benchmark test set and an internal Chinese spoken language dataset demonstrate that the proposed approach achieves significant improvement on punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models. The proposed Discriminative Self-Training approach outperforms the vanilla self-training approach. We establish a new state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current SOTA model by 1.3% absolute gain on F-1.
引用
收藏
页码:771 / 775
页数:5
相关论文
共 50 条
  • [21] Self-Training Statistical Quality Prediction of Batch Processes with Limited Quality Data
    Ge, Zhiqiang
    Song, Zhihuan
    Gao, Furong
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2013, 52 (02) : 979 - 984
  • [22] Low-Resource Mandarin Prosodic Structure Prediction Using Self-Training
    Wang, Xingrui
    Zhang, Bowen
    Shinozaki, Takahiro
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 859 - 863
  • [23] SELF-BLM: Prediction of drug-target interactions via self-training SVM
    Keum, Jongsoo
    Nam, Hojung
    [J]. PLOS ONE, 2017, 12 (02):
  • [24] Self-Training System of Calligraphy Brushwork
    Morikawa, Ami
    Tsuda, Naoaki
    Nomura, Yoshihiko
    Kato, Norihiko
    [J]. COMPANION OF THE 2017 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI'17), 2017, : 215 - 216
  • [25] Adversarial self-training for robustness and generalization
    Li, Zhuorong
    Wu, Minghui
    Jin, Canghong
    Yu, Daiwei
    Yu, Hongchuan
    [J]. PATTERN RECOGNITION LETTERS, 2024, 185 : 117 - 123
  • [26] Self-training for Cell Segmentation and Counting
    Luo, J.
    Oore, S.
    Hollensen, P.
    Fine, A.
    Trappenberg, T.
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 406 - 412
  • [27] Unsupervised Controllable Generation with Self-Training
    Chrysos, Grigorios G.
    Kossaifi, Jean
    Yu, Zhiding
    Anandkumar, Anima
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [28] CONSIDERATIONS ON SELF-TRAINING IN THE INNOVATION UNION
    Blaga, Petruta
    Tripon, Avram
    [J]. STUDIES ON LITERATURE, DISCOURSE AND MULTICULTURAL DIALOGUE: COMMUNICATION AND PUBLIC RELATIONS, 2013, : 56 - 61
  • [29] Adaptive Self-Training for Object Detection
    Vandeghen, Renaud
    Louppe, Gilles
    Van Droogenbroeck, Marc
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 914 - 923
  • [30] Crafting networks: A self-training intervention
    Wang, Huatian
    Demerouti, Evangelia
    Rispens, Sonja
    van Gool, Piet
    [J]. JOURNAL OF VOCATIONAL BEHAVIOR, 2024, 149