Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System

被引:2
|
作者
Egorova, Ekaterina [1 ]
Vydana, Hari Krishna [1 ]
Burget, Lukas [1 ]
Cernocky, Jan [1 ]
机构
[1] Brno Univ Technol, Fac Informat Technol, Speech FIT, Brno, Czech Republic
来源
基金
美国国家科学基金会;
关键词
Speech recognition; Out-of-vocabulary; OOV; Attention; CTC; End-to-end; MODELS;
D O I
10.21437/Interspeech.2021-1756
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This work explores the effectiveness of detecting positions of out-of-vocabulary words (OOVs) in a decoded utterance using attention weights and CTC per-frame outputs of an end-to-end system predicting word sequences. We show that the end-to-end approach can be effective for the task of OOV detection. CTC alignments are shown to provide better temporal information about the positions of OOV words than attention, and therefore are more suitable for the task. The detected positions of OOV occurrences are utilized for the recurrent OOV recovery task in which probabilistic representations of the pronunciations of the detected OOVs are clustered in order to find repeating words. Improved detection results are shown to correlate with better performance of the recovery of recurrent OOVs.
引用
收藏
页码:2901 / 2905
页数:5
相关论文
共 50 条
  • [31] Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition
    Qu, Leyuan
    Weber, Cornelius
    Wermter, Stefan
    [J]. NEURAL NETWORKS, 2023, 161 : 494 - 504
  • [32] ENDPOINT DETECTION FOR STREAMING END-TO-END MULTI-TALKER ASR
    Lu, Liang
    Li, Jinyu
    Gong, Yifan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7312 - 7316
  • [33] Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language
    Park, Hosung
    Kim, Changmin
    Son, Hyunsoo
    Seo, Soonshin
    Kim, Ji-Hwan
    [J]. JOURNAL OF WEB ENGINEERING, 2022, 21 (02): : 265 - 284
  • [34] Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    [J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 337 - 347
  • [35] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
    Miao, Haoran
    Cheng, Gaofeng
    Gao, Changfeng
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088
  • [36] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINT CTC-ATTENTION BASED MODELS
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 936 - 943
  • [37] LOW-FREQUENCY CHARACTER CLUSTERING FOR END-TO-END ASR SYSTEM
    Ito, Hitoshi
    Hagiwara, Aiko
    Ichiki, Manon
    Kobayakawa, Takeshi
    Mishima, Takeshi
    Sato, Shoei
    Kobayashi, Akio
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 187 - 191
  • [38] An end-to-end continuous Kannada ASR system under uncontrolled environment
    G. Thimmaraja Yadava
    B. G. Nagaraja
    H. S. Jayanna
    [J]. Multimedia Tools and Applications, 2024, 83 : 7981 - 7994
  • [39] Extremely Low Footprint End-to-End ASR System for Smart Device
    Gao, Zhifu
    Yao, Yiwu
    Zhang, Shiliang
    Yang, Jun
    Lei, Ming
    McLoughlin, Ian
    [J]. INTERSPEECH 2021, 2021, : 4548 - 4552
  • [40] Regularizing cross-attention learning for end-to-end speech translation with ASR and MT attention matrices
    Zhao, Xiaohu
    Sun, Haoran
    Lei, Yikun
    Xiong, Deyi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247