The Representation of Speech in Deep Neural Networks

被引:2
|
作者
Scharenborg, Odette [1 ,2 ]
van der Gouw, Nikki [2 ]
Larson, Martha [1 ,2 ]
Marchiori, Elena [2 ]
机构
[1] Delft Univ Technol, Multimedia Comp Grp, Delft, Netherlands
[2] Radboud Univ Nijmegen, Nijmegen, Netherlands
来源
关键词
Deep neural networks; Speech representations; Visualizations;
D O I
10.1007/978-3-030-05716-9_16
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naive, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.
引用
下载
收藏
页码:194 / 205
页数:12
相关论文
共 50 条
  • [21] Speech Activity Detection Using Deep Neural Networks
    Shahsavari, Sajad
    Sameti, Hossein
    Hadian, Hossein
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
  • [22] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [23] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
    Ravanelli, Mirco
    Brakel, Philemon
    Omologo, Maurizio
    Bengio, Yoshua
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
  • [24] Survey on Deep Neural Networks in Speech and Vision Systems
    Alam, M.
    Samad, M. D.
    Vidyaratne, L.
    Glandon, A.
    Iftekharuddin, K. M.
    NEUROCOMPUTING, 2020, 417 : 302 - 321
  • [25] SYNAPTIC DEPRESSION IN DEEP NEURAL NETWORKS FOR SPEECH PROCESSING
    Zhang, Wenhao
    Li, Hanyu
    Yang, Minda
    Mesgarani, Nima
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5865 - 5869
  • [26] Speech bandwidth expansion based on Deep Neural Networks
    Wang, Yingxue
    Zhao, Shenghui
    Liu, Wenbo
    Li, Ming
    Kuang, Jingming
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2593 - 2597
  • [27] Mongolian Speech Recognition Based on Deep Neural Networks
    Zhang, Hui
    Bao, Feilong
    Gao, Guanglai
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
  • [28] Emotional Speech Recognition Using Deep Neural Networks
    Trinh Van, Loan
    Dao Thi Le, Thuy
    Le Xuan, Thanh
    Castelli, Eric
    SENSORS, 2022, 22 (04)
  • [29] SPEECH ENHANCEMENT USING MULTIPLE DEEP NEURAL NETWORKS
    Karjol, Pavan
    Kumar, Ajay M.
    Ghosh, Prasanta Kumar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5049 - 5053
  • [30] DEEP NEURAL NETWORKS FOR ESTIMATING SPEECH MODEL ACTIVATIONS
    Williamson, Donald S.
    Wang, Yuxuan
    Wang, DeLiang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5113 - 5117