Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

被引：0

作者：

Belinkov, Yonatan ^{[1
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) | 2017年 / 30卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Neural networks have become ubiquitous in automatic speech recognition systems. While neural networks are typically used as acoustic models in more complex systems, recent studies have explored end-to-end speech recognition systems based on neural networks, which can be trained to directly predict text from input acoustic features. Although such systems are conceptually elegant and simpler than traditional systems, it is less obvious how to interpret the trained models. In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss. We use a pre-trained model to generate frame-level features which are given to a classifier that is trained on frame classification into phones. We evaluate representations from different layers of the deep model and compare their quality for predicting phone labels. Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.

引用

页数：11

共 50 条

[31] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[32] AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION
Chang, Xuankai
Maekaku, Takashi
Guo, Pengcheng
Shi, Jing
Lu, Yen-Ju
Subramanian, Aswin Shanmugam
Wang, Tianzi
Yang, Shu-wen
Tsao, Yu
Lee, Hung-yi
Watanabe, Shinji
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 228 - 235
[33] UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS
Bijwadia, Shaan
Chang, Shuo-yiin
Li, Bo
Sainath, Tara
Zhang, Chao
He, Yanzhang
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 310 - 316
[34] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
Liu, Da-Rong
Yang, Chi-Yu
Wu, Szu-Lin
Lee, Hung-Yi
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
[35] AN END-TO-END APPROACH TO JOINT SOCIAL SIGNAL DETECTION AND AUTOMATIC SPEECH RECOGNITION
Inaguma, Hirofumi
Mimura, Masato
Inoue, Koji
Yoshii, Kazuyoshi
Kawahara, Tatsuya
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6214 - 6218
[36] SFA: Searching faster architectures for end-to-end automatic speech recognition models
Liu, Yukun
Li, Ta
Zhang, Pengyuan
Yan, Yonghong
[J]. COMPUTER SPEECH AND LANGUAGE, 2023, 81
[37] AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation
Kang, Byung Ok
Jeon, Hyung-Bae
Lee, Yun Kyung
[J]. ETRI JOURNAL, 2024, 46 (01) : 48 - 58
[38] EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition
Wang, Chengyu
Cheng, Mengli
Hu, Xu
Huang, Jun
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 16111 - 16113
[39] An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
Sim, Khe Chai
Zadrazil, Petr
Beaufays, Francoise
[J]. INTERSPEECH 2019, 2019, : 774 - 778
[40] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487

← 1 2 3 4 5 →