UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR

被引:1
|
作者
Sivaraman, Ganesh [1 ]
Casal, Ricardo [1 ]
Garland, Matt [1 ]
Khoury, Elie [1 ]
机构
[1] Pindrop, Atlanta, GA 30308 USA
关键词
End-to-end; speech recognition; unsupervised adaptation; confidence measure; call centers; telephony audio;
D O I
10.1109/ICASSP43922.2022.9746188
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) Automatic Speech Recognition (ASR) systems are widely applied in various devices and communication domains. However, state-of-the-art ASR systems are known to underperform when there is a mismatch in the training and test domains. As a result, acoustic models deployed in production are often adapted to the target domain to improve accuracy. This paper proposes a method to perform unsupervised model adaptation for E2E ASR using first-pass transcriptions of adaptation data produced by the baseline ASR model itself. The paper proposes two transcription confidence measures that can be used to select an optimal in-domain adaptation set. Experiments were performed using the Quartznet ASR architecture on the HarperValleyBank corpus. Results show that the unsupervised adaptation technique with the confidence measure based data selection results in a 8% absolute reduction in word error rate on the HarperValleyBank test set. The proposed method can be applied to any E2E ASR system and is suitable for model adaptation on call center audio with little to no manual transcription.
引用
收藏
页码:6987 / 6991
页数:5
相关论文
共 50 条
  • [21] Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition
    Zheng, Lin
    Zhu, Han
    Tian, Sanli
    Zhao, Qingwei
    Li, Ta
    IEEE Signal Processing Letters, 2024, 31 : 3119 - 3123
  • [22] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [23] End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model
    Feng, Han
    Ueno, Sei
    Kawahara, Tatsuya
    INTERSPEECH 2020, 2020, : 501 - 505
  • [24] TRANSFER LEARNING OF LANGUAGE-INDEPENDENT END-TO-END ASR WITH LANGUAGE MODEL FUSION
    Inaguma, Hirofumi
    Cho, Jaejin
    Baskar, Murali Karthick
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6096 - 6100
  • [25] STREAMING BILINGUAL END-TO-END ASR MODEL USING ATTENTION OVER MULTIPLE SOFTMAX
    Patil, Aditya
    Joshi, Vikas
    Agrawal, Purvi
    Mehta, Rupesh
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 252 - 259
  • [26] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [27] Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
    Karakasidis, Georgios
    Grosz, Tamas
    Kurimo, Mikko
    INTERSPEECH 2022, 2022, : 66 - 70
  • [28] Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
    Chen, Zhehuai
    Jain, Mahaveer
    Wang, Yongqiang
    Seltzer, Michael L.
    Fuegen, Christian
    INTERSPEECH 2019, 2019, : 3490 - 3494
  • [29] BILINGUAL END-TO-END ASR WITH BYTE-LEVEL SUBWORDS
    Deng, Liuhui
    Hsiao, Roger
    Ghoshal, Arnab
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6417 - 6421
  • [30] Comparison and analysis of new curriculum criteria for end-to-end ASR
    Karakasidis, Georgios
    Kurimo, Mikko
    Bell, Peter
    Grosz, Tamas
    SPEECH COMMUNICATION, 2024, 163