Discriminative Self-training for Punctuation Prediction

被引:2
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Chen, Mengzhe [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
来源
关键词
punctuation prediction; self-training; label smoothing; Transformer; BERT;
D O I
10.21437/Interspeech.2021-246
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications. However, achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts, which is expensive and laborious. In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. Experimental results on the English IWSLT2011 benchmark test set and an internal Chinese spoken language dataset demonstrate that the proposed approach achieves significant improvement on punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models. The proposed Discriminative Self-Training approach outperforms the vanilla self-training approach. We establish a new state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current SOTA model by 1.3% absolute gain on F-1.
引用
收藏
页码:771 / 775
页数:5
相关论文
共 50 条
  • [1] On the Effectiveness of Self-Training in MOOC Dropout Prediction
    Goel, Yamini
    Goyal, Rinkaj
    [J]. OPEN COMPUTER SCIENCE, 2020, 10 (01) : 246 - 258
  • [2] Interpolative self-training approach for link prediction
    Aghababaei, Somayyeh
    Makrehchi, Masoud
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1379 - 1395
  • [3] Self-training ABS
    Akhmetshin, A.M.
    [J]. Avtomobil'naya Promyshlennost, 2001, (06): : 34 - 36
  • [5] Toward Robust Self-Training Paradigm for Molecular Prediction Tasks
    Ma, Hehuan
    Jiang, Feng
    Rong, Yu
    Guo, Yuzhi
    Huang, Junzhou
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2024, 31 (03) : 213 - 228
  • [6] Soil Temperature Prediction via Self-Training: Izmir Case
    Tuysuzoglu, Goksu
    Birant, Derya
    Kiranoglu, Volkan
    [J]. JOURNAL OF AGRICULTURAL SCIENCES-TARIM BILIMLERI DERGISI, 2022, 28 (01): : 47 - 62
  • [7] SETRED: Self-training with editing
    Li, M
    Zhou, ZH
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 611 - 621
  • [8] Exploiting Censored Information in Self-Training for Time-to-Event Prediction
    Haredasht, Fateme Nateghi
    Dauda, Kazeem Adesina
    Vens, Celine
    [J]. IEEE ACCESS, 2023, 11 : 96831 - 96840
  • [9] Boosting Aspect Sentiment Quad Prediction by Data Augmentation and Self-Training
    Yu, Yongxin
    Zhao, Minyi
    Zhou, Shuigeng
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [10] Deep Bayesian Self-Training
    Fabio De Sousa Ribeiro
    Francesco Calivá
    Mark Swainson
    Kjartan Gudmundsson
    Georgios Leontidis
    Stefanos Kollias
    [J]. Neural Computing and Applications, 2020, 32 : 4275 - 4291