Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System

被引：2

作者：

Egorova, Ekaterina ^{[1
]}

Vydana, Hari Krishna ^{[1
]}

Burget, Lukas ^{[1
]}

Cernocky, Jan ^{[1
]}

机构：

[1] Brno Univ Technol, Fac Informat Technol, Speech FIT, Brno, Czech Republic

来源：

INTERSPEECH 2021 | 2021年

基金：

美国国家科学基金会;

关键词：

Speech recognition; Out-of-vocabulary; OOV; Attention; CTC; End-to-end; MODELS;

D O I：

10.21437/Interspeech.2021-1756

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This work explores the effectiveness of detecting positions of out-of-vocabulary words (OOVs) in a decoded utterance using attention weights and CTC per-frame outputs of an end-to-end system predicting word sequences. We show that the end-to-end approach can be effective for the task of OOV detection. CTC alignments are shown to provide better temporal information about the positions of OOV words than attention, and therefore are more suitable for the task. The detected positions of OOV occurrences are utilized for the recurrent OOV recovery task in which probabilistic representations of the pronunciations of the detected OOVs are clustered in order to find repeating words. Improved detection results are shown to correlate with better performance of the recovery of recurrent OOVs.

引用

页码：2901 / 2905

页数：5

共 50 条

[1] USING SYNTHETIC AUDIO TO IMPROVE THE RECOGNITION OF OUT-OF-VOCABULARY WORDS IN END-TO-END ASR SYSTEMS
Zheng, Xianrui
Liu, Yulan
Gunceler, Deniz
Willett, Daniel
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5674 - 5678
[2] Detection of Out-of-Vocabulary Words in Posterior Based ASR
Ketabdar, Hamed
Hannemann, Mirko
Hermansky, Hynek
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2772 - 2775
[3] COPING WITH OUT-OF-VOCABULARY WORDS: OPEN VERSUS HUGE VOCABULARY ASR
Gerosa, Matteo
Federico, Marcello
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4313 - 4316
[4] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
Gao, Qiang
Wu, Haiwei
Sun, Yanqing
Duan, Yitao
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257
[5] Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Higuchi, Yosuke
Watanabe, Shinji
Chen, Nanxin
Ogawa, Tetsuji
Kobayashi, Tetsunori
[J]. INTERSPEECH 2020, 2020, : 3655 - 3659
[6] TOWARDS CODE-SWITCHING ASR FOR END-TO-END CTC MODELS
Li, Ke
Li, Jinyu
Ye, Guoli
Zhao, Rui
Gong, Yifan
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6076 - 6080
[7] Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments
Yang, Runyan
Cheng, Gaofeng
Miao, Haoran
Li, Ta
Zhang, Pengyuan
Yan, Yonghong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3202 - 3215
[8] IMPROVED MASK-CTC FOR NON-AUTOREGRESSIVE END-TO-END ASR
Higuchi, Yosuke
Inaguma, Hirofumi
Watanabe, Shinji
Ogawa, Tetsuji
Kobayashi, Tetsunori
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8363 - 8367
[9] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
Watanabe, Shinji
Hori, Takaaki
Kim, Suyoun
Hershey, John R.
Hayashi, Tomoki
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
[10] LCANet: End-to-End Lipreading with Cascaded Attention-CTC
Xu, Kai
Li, Dawei
Cassimatis, Nick
Wang, Xiaolong
[J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 548 - 555

← 1 2 3 4 5 →