Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

被引:0
|
作者
Belinkov, Yonatan [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural networks have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
    Sim, Khe Chai
    Zadrazil, Petr
    Beaufays, Francoise
    [J]. INTERSPEECH 2019, 2019, : 774 - 778
  • [42] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
    Yamini, Shaarada D.
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    Purini, Suresh
    [J]. 2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
  • [43] Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture
    Miao, Haoran
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1452 - 1465
  • [44] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Tu, Zehai
    Deadman, Jack
    Ma, Ning
    Barker, Jon
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451
  • [45] Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition
    Shi, Hao
    Wang, Longbiao
    Li, Sheng
    Fang, Cunhang
    Dang, Jianwu
    Kawahara, Tatsuya
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 438 - 442
  • [46] LWMD: A Comprehensive Compression Platform for End-to-End Automatic Speech Recognition Models
    Liu, Yukun
    Li, Ta
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [47] Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition
    Lakomkin, Egor
    Heymann, Jahn
    Sklyar, Ilya
    Wiesler, Simon
    [J]. INTERSPEECH 2020, 2020, : 3600 - 3604
  • [48] SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6266 - 6270
  • [49] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    [J]. INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [50] END-TO-END AUTOMATIC SPEECH TRANSLATION OF AUDIOBOOKS
    Berard, Alexandre
    Besacier, Laurent
    Kocabiyikoglu, Ali Can
    Pietquin, Olivier
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6224 - 6228