Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

被引:10
|
作者
Belinkov, Yonatan [1 ,2 ]
Ali, Ahmed [3 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Harvard John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02134 USA
[3] HBKU, Qatar Comp Res Inst, Doha, Qatar
来源
关键词
speech recognition; end-to-end; phonemes; graphemes; analysis; interpretability; NEURAL-NETWORKS; RECURRENT;
D O I
10.21437/Interspeech.2019-2599
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three datasets, finding remarkable consistency in how different properties are represented in different layers of the deep neural network.
引用
收藏
页码:81 / 85
页数:5
相关论文
共 50 条
  • [1] Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
    Belinkov, Yonatan
    Glass, James
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [2] An Overview of End-to-End Automatic Speech Recognition
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    [J]. SYMMETRY-BASEL, 2019, 11 (08):
  • [3] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    [J]. INTERSPEECH 2021, 2021, : 4079 - 4083
  • [4] Recent Advances in End-to-End Automatic Speech Recognition
    Li, Jinyu
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [5] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273
  • [6] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [7] Continual Learning for Monolingual End-to-End Automatic Speech Recognition
    Vander Eeckt, Steven
    Van Hamme, Hugo
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 459 - 463
  • [8] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [9] End-to-End Automatic Speech Recognition with Deep Mutual Learning
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Ashihara, Takanori
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 632 - 637
  • [10] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
    Parcollet, Titouan
    Zhang, Ying
    Morchid, Mohamed
    Trabelsi, Chiheb
    Linares, Georges
    De Mori, Renato
    Bengio, Yoshua
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26