Spelling-Aware Word-Based End-to-End ASR

被引:1
|
作者
Egorova, Ekaterina [1 ]
Vydana, Hari Krishna [1 ]
Burget, Lukas [1 ]
Cernocky, Jan Honza [1 ]
机构
[1] Brno Univ Technol, Fac Informat Technol Speech FIT, CS-61090 Brno, Czech Republic
基金
欧盟地平线“2020”;
关键词
Training; Vocabulary; Task analysis; Decoding; Predictive models; Training data; Recurrent neural networks; ASR; end-to-end; listen attend and spell architecture; OOV;
D O I
10.1109/LSP.2022.3192199
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a new end-to-end architecture for automatic speech recognition that expands the "listen, attend and spell" (LAS) paradigm. While the main word-predicting network is trained to predict words, the secondary, speller network, is optimized to predict word spellings from inner representations of the main network (e.g. word embeddings or context vectors from the attention module). We show that this joint training improves the word error rate of a word-based system and enables solving additional tasks, such as out-of-vocabulary word detection and recovery. The tests are conducted on LibriSpeech dataset consisting of 1000 h of read speech.
引用
收藏
页码:1729 / 1733
页数:5
相关论文
共 50 条
  • [41] Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
    Joshi, Vikas
    Das, Amit
    Sun, Eric
    Mehta, Rupesh R.
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2021, 2021, : 1767 - 1771
  • [42] Extremely Low Footprint End-to-End ASR System for Smart Device
    Gao, Zhifu
    Yao, Yiwu
    Zhang, Shiliang
    Yang, Jun
    Lei, Ming
    McLoughlin, Ian
    [J]. INTERSPEECH 2021, 2021, : 4548 - 4552
  • [43] SCALING END-TO-END MODELS FOR LARGE-SCALE MULTILINGUAL ASR
    Li, Bo
    Pang, Ruoming
    Sainath, Tara N.
    Gulati, Anmol
    Zhang, Yu
    Qin, James
    Haghani, Parisa
    Huang, W. Ronny
    Ma, Min
    Bai, Junwen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1011 - 1018
  • [44] END-TO-END ARCHITECTURES FOR ASR-FREE SPOKEN LANGUAGE UNDERSTANDING
    Palogiannidi, Elisavet
    Gkinis, Ioannis
    Mastrapas, George
    Mizera, Petr
    Stafylakis, Themos
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7974 - 7978
  • [45] AN INVESTIGATION OF MULTILINGUAL ASR USING END-TO-END LF-MMI
    Tong, Sibo
    Garner, Philip N.
    Bourlard, Herve
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6061 - 6065
  • [46] End-to-End ASR-Free Keyword Search From Speech
    Audhkhasi, Kartik
    Rosenberg, Andrew
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Kingsbury, Brian
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1351 - 1359
  • [47] TOWARDS CODE-SWITCHING ASR FOR END-TO-END CTC MODELS
    Li, Ke
    Li, Jinyu
    Ye, Guoli
    Zhao, Rui
    Gong, Yifan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6076 - 6080
  • [48] ENDPOINT DETECTION FOR STREAMING END-TO-END MULTI-TALKER ASR
    Lu, Liang
    Li, Jinyu
    Gong, Yifan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7312 - 7316
  • [49] Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
    Lu, Zhiyun
    Han, Wei
    Zhang, Yu
    Cao, Langliang
    [J]. INTERSPEECH 2021, 2021, : 3460 - 3464
  • [50] BACK-TRANSLATION-STYLE DATA AUGMENTATION FOR END-TO-END ASR
    Hayashi, Tomoki
    Watanabe, Shinji
    Zhang, Yu
    Toda, Tomoki
    Hori, Takaaki
    Astudillo, Ramon
    Takeda, Kazuya
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 426 - 433