Spelling-Aware Word-Based End-to-End ASR

被引:1
|
作者
Egorova, Ekaterina [1 ]
Vydana, Hari Krishna [1 ]
Burget, Lukas [1 ]
Cernocky, Jan Honza [1 ]
机构
[1] Brno Univ Technol, Fac Informat Technol Speech FIT, CS-61090 Brno, Czech Republic
基金
欧盟地平线“2020”;
关键词
Training; Vocabulary; Task analysis; Decoding; Predictive models; Training data; Recurrent neural networks; ASR; end-to-end; listen attend and spell architecture; OOV;
D O I
10.1109/LSP.2022.3192199
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a new end-to-end architecture for automatic speech recognition that expands the "listen, attend and spell" (LAS) paradigm. While the main word-predicting network is trained to predict words, the secondary, speller network, is optimized to predict word spellings from inner representations of the main network (e.g. word embeddings or context vectors from the attention module). We show that this joint training improves the word error rate of a word-based system and enables solving additional tasks, such as out-of-vocabulary word detection and recovery. The tests are conducted on LibriSpeech dataset consisting of 1000 h of read speech.
引用
收藏
页码:1729 / 1733
页数:5
相关论文
共 50 条
  • [1] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
    Hori, Takaaki
    Cho, Jaejin
    Watanabe, Shinji
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
  • [2] ASR-AWARE END-TO-END NEURAL DIARIZATION
    Khare, Aparna
    Han, Eunjung
    Yang, Yuguang
    Stolcke, Andreas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8092 - 8096
  • [3] SPEAKER AND LANGUAGE AWARE TRAINING FOR END-TO-END ASR
    Bansal, Shubham
    Malhotra, Karan
    Ganapathy, Sriram
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 494 - 501
  • [4] LEARNING WORD-LEVEL CONFIDENCE FOR SUBWORD END-TO-END ASR
    Qiu, David
    Li, Qiujia
    He, Yanzhang
    Zhang, Yu
    Li, Bo
    Cao, Liangliang
    Prabhavalkar, Rohit
    Bhatia, Deepti
    Li, Wei
    Hu, Ke
    Sainath, Tara N.
    McGraw, Ian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6393 - 6397
  • [5] Class LM and Word Mapping for Contextual Biasing in End-to-End ASR
    Huang, Rongqing
    Abdel-hamid, Ossama
    Li, Xinwei
    Evermann, Gunnar
    [J]. INTERSPEECH 2020, 2020, : 4348 - 4351
  • [6] META-LEARNING FOR IMPROVING RARE WORD RECOGNITION IN END-TO-END ASR
    Lux, Florian
    Ngoc Thang Vu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5974 - 5978
  • [7] DOES SPEECH ENHANCEMENTWORK WITH END-TO-END ASR OBJECTIVES?: EXPERIMENTAL ANALYSIS OF MULTICHANNEL END-TO-END ASR
    Ochiai, Tsubasa
    Watanabe, Shinji
    Katagiri, Shigeru
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [8] BOUNDARY AND CONTEXT AWARE TRAINING FOR CIF-BASED NON-AUTOREGRESSIVE END-TO-END ASR
    Yu, Fan
    Luo, Haoneng
    Guo, Pengcheng
    Bang, Yuhao
    Yao, Zhuoyuan
    Xie, Lei
    Gao, Yingying
    Hou, Leijing
    Zhang, Shilei
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 328 - 334
  • [9] Auxiliary feature based adaptation of end-to-end ASR systems
    Delcroix, Marc
    Watanabe, Shinji
    Ogawa, Atsunori
    Karita, Shigeki
    Nakatani, Tomohiro
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2444 - 2448
  • [10] End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model
    Feng, Han
    Ueno, Sei
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 501 - 505