The Representation of Speech in Deep Neural Networks

被引:2
|
作者
Scharenborg, Odette [1 ,2 ]
van der Gouw, Nikki [2 ]
Larson, Martha [1 ,2 ]
Marchiori, Elena [2 ]
机构
[1] Delft Univ Technol, Multimedia Comp Grp, Delft, Netherlands
[2] Radboud Univ Nijmegen, Nijmegen, Netherlands
来源
关键词
Deep neural networks; Speech representations; Visualizations;
D O I
10.1007/978-3-030-05716-9_16
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naive, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.
引用
收藏
页码:194 / 205
页数:12
相关论文
共 50 条
  • [31] Speech De-identification with Deep Neural Networks
    Fodor, Adam
    Kopacsi, Laszlo
    Milacski, Zoltan A.
    Lorincz, Andras
    ACTA CYBERNETICA, 2021, 25 (02): : 257 - 269
  • [32] DEEP SPARSE RECTIFIER NEURAL NETWORKS FOR SPEECH DENOISING
    Xu, Lie
    Choy, Chiu-Sing
    Li, Yi-Wen
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [33] COMPRESSING DEEP NEURAL NETWORKS FOR EFFICIENT SPEECH ENHANCEMENT
    Tan, Ke
    Wang, DeLiang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8358 - 8362
  • [34] INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 124 - 129
  • [35] Figure-ground representation in deep neural networks
    Hu, Brian
    Khan, Salman
    Niebur, Ernst
    Tripp, Bryan
    2019 53RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2019,
  • [36] Representation of Imprecision in Deep Neural Networks for Image Classification
    Zhang, Zuowei
    Liu, Zhunga
    Ning, Liangbo
    Martin, Arnaud
    Xiong, Jiexuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 14
  • [37] Compact and Computationally Efficient Representation of Deep Neural Networks
    Wiedemann, Simon
    Mueller, Klaus-Robert
    Samek, Wojciech
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (03) : 772 - 785
  • [38] Representation of Nonlocal Shape Information in Deep Neural Networks
    Keshvari, Shaiyan
    Frund, Ingo
    Elder, James H.
    PERCEPTION, 2019, 48 : 78 - 78
  • [39] EXPLORING DEEP NEURAL NETWORKS AND DEEP AUTOENCODERS IN REVERBERANT SPEECH RECOGNITION
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 197 - 201
  • [40] Deep representation-based transfer learning for deep neural networks
    Yang, Tao
    Yu, Xia
    Ma, Ning
    Zhang, Yifu
    Li, Hongru
    KNOWLEDGE-BASED SYSTEMS, 2022, 253