Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

被引:0
|
作者
Belinkov, Yonatan [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural networks have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
    Belinkov, Yonatan
    Ali, Ahmed
    Glass, James
    [J]. INTERSPEECH 2019, 2019, : 81 - 85
  • [2] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [3] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    [J]. INTERSPEECH 2021, 2021, : 4079 - 4083
  • [5] Recent Advances in End-to-End Automatic Speech Recognition
    Li, Jinyu
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [6] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273
  • [7] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [8] End-to-end neural systems for automatic children speech recognition: An empirical study
    Shivakumar, Prashanth Gurunath
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [9] WHAT DOES A NETWORK LAYER HEAR? ANALYZING HIDDEN REPRESENTATIONS OF END-TO-END ASR THROUGH SPEECH SYNTHESIS
    Li, Chung-Yi
    Yuan, Pei-Chieh
    Lee, Hung-Yi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6434 - 6438
  • [10] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463