Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

被引:0
|
作者
Belinkov, Yonatan [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural networks have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] END-TO-END ANCHORED SPEECH RECOGNITION
    Wang, Yiming
    Fan, Xing
    Chen, I-Fan
    Liu, Yuzong
    Chen, Tongfei
    Hoffmeister, Bjorn
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
  • [32] AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION
    Chang, Xuankai
    Maekaku, Takashi
    Guo, Pengcheng
    Shi, Jing
    Lu, Yen-Ju
    Subramanian, Aswin Shanmugam
    Wang, Tianzi
    Yang, Shu-wen
    Tsao, Yu
    Lee, Hung-yi
    Watanabe, Shinji
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 228 - 235
  • [33] UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS
    Bijwadia, Shaan
    Chang, Shuo-yiin
    Li, Bo
    Sainath, Tara
    Zhang, Chao
    He, Yanzhang
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 310 - 316
  • [34] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [35] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
    Inaguma, Hirofumi
    Mimura, Masato
    Inoue, Koji
    Yoshii, Kazuyoshi
    Kawahara, Tatsuya
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6214 - 6218
  • [36] SFA: Searching faster architectures for end-to-end automatic speech recognition models
    Liu, Yukun
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. COMPUTER SPEECH AND LANGUAGE, 2023, 81
  • [37] AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation
    Kang, Byung Ok
    Jeon, Hyung-Bae
    Lee, Yun Kyung
    [J]. ETRI JOURNAL, 2024, 46 (01) : 48 - 58
  • [38] EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition
    Wang, Chengyu
    Cheng, Mengli
    Hu, Xu
    Huang, Jun
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 16111 - 16113
  • [39] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    [J]. INTERSPEECH 2019, 2019, : 774 - 778
  • [40] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
    Hadwan, Mohammed
    Alsayadi, Hamzah A.
    AL-Hagree, Salah
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487