Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System

被引:2
|
作者
Egorova, Ekaterina [1 ]
Vydana, Hari Krishna [1 ]
Burget, Lukas [1 ]
Cernocky, Jan [1 ]
机构
[1] Brno Univ Technol, Fac Informat Technol, Speech FIT, Brno, Czech Republic
来源
基金
美国国家科学基金会;
关键词
Speech recognition; Out-of-vocabulary; OOV; Attention; CTC; End-to-end; MODELS;
D O I
10.21437/Interspeech.2021-1756
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This work explores the effectiveness of detecting positions of out-of-vocabulary words (OOVs) in a decoded utterance using attention weights and CTC per-frame outputs of an end-to-end system predicting word sequences. We show that the end-to-end approach can be effective for the task of OOV detection. CTC alignments are shown to provide better temporal information about the positions of OOV words than attention, and therefore are more suitable for the task. The detected positions of OOV occurrences are utilized for the recurrent OOV recovery task in which probabilistic representations of the pronunciations of the detected OOVs are clustered in order to find repeating words. Improved detection results are shown to correlate with better performance of the recovery of recurrent OOVs.
引用
收藏
页码:2901 / 2905
页数:5
相关论文
共 50 条
  • [1] USING SYNTHETIC AUDIO TO IMPROVE THE RECOGNITION OF OUT-OF-VOCABULARY WORDS IN END-TO-END ASR SYSTEMS
    Zheng, Xianrui
    Liu, Yulan
    Gunceler, Deniz
    Willett, Daniel
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5674 - 5678
  • [2] Detection of Out-of-Vocabulary Words in Posterior Based ASR
    Ketabdar, Hamed
    Hannemann, Mirko
    Hermansky, Hynek
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2772 - 2775
  • [3] COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR
    Gerosa, Matteo
    Federico, Marcello
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4313 - 4316
  • [4] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
    Gao, Qiang
    Wu, Haiwei
    Sun, Yanqing
    Duan, Yitao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257
  • [5] Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
    Higuchi, Yosuke
    Watanabe, Shinji
    Chen, Nanxin
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. INTERSPEECH 2020, 2020, : 3655 - 3659
  • [6] TOWARDS CODE-SWITCHING ASR FOR END-TO-END CTC MODELS
    Li, Ke
    Li, Jinyu
    Ye, Guoli
    Zhao, Rui
    Gong, Yifan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6076 - 6080
  • [7] Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments
    Yang, Runyan
    Cheng, Gaofeng
    Miao, Haoran
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3202 - 3215
  • [8] IMPROVED MASK-CTC FOR NON-AUTOREGRESSIVE END-TO-END ASR
    Higuchi, Yosuke
    Inaguma, Hirofumi
    Watanabe, Shinji
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8363 - 8367
  • [9] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Watanabe, Shinji
    Hori, Takaaki
    Kim, Suyoun
    Hershey, John R.
    Hayashi, Tomoki
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
  • [10] LCANet: End-to-End Lipreading with Cascaded Attention-CTC
    Xu, Kai
    Li, Dawei
    Cassimatis, Nick
    Wang, Xiaolong
    [J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 548 - 555