Inverted Alignments for End-to-End Automatic Speech Recognition

被引:7
|
作者
Doetsch, Patrick [1 ]
Hannemann, Mirko [1 ]
Schluter, Ralf [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Comp Sci Dept, Lehrstuhl Informat 6, D-52062 Aachen, Germany
基金
欧洲研究理事会;
关键词
Automatic speech recognition (ASR); end-to-end; alignment; connectionist temporal classification (CTC); attention mechanism; segmental models; CTC;
D O I
10.1109/JSTSP.2017.2752691
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we propose an inverted alignment approach for sequence classification systems like automatic speech recognition (ASR) that naturally incorporates discriminative, artificial-neural-network-based label distributions. Instead of aligning each input frame to a state label as in the standard hidden Markov model (HMM) derivation, we propose to inversely align each element of an HMM state label sequence to a segment-wise encoding of several consecutive input frames. This enables an integrated discriminative model that can be trained end-to-end from scratch or starting from an existing alignment path. The approach does not assume the usual decomposition into a separate (generative) acoustic model and a language model, and allows for a variety of model assumptions, including statistical variants of attention. Following our initial paper with proof-of-concept experiments on handwriting recognition, the focus of this paper was the investigation of integrated training and an inverted decoding approach, whereas the acoustic modeling still remains largely similar to standard hybrid modeling. We provide experiments on the CHiME-4 noisy ASR task. Our results show that we can reach competitive results with inverted alignment and decoding strategies.
引用
收藏
页码:1265 / 1273
页数:9
相关论文
共 50 条
  • [1] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [2] Recent Advances in End-to-End Automatic Speech Recognition
    Li, Jinyu
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [3] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [4] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463
  • [5] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [6] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
  • [7] Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
    Belinkov, Yonatan
    Glass, James
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [8] Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
    Belinkov, Yonatan
    Ali, Ahmed
    Glass, James
    [J]. INTERSPEECH 2019, 2019, : 81 - 85
  • [9] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
  • [10] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,