The Representation of Speech in Deep Neural Networks

被引:2
|
作者
Scharenborg, Odette [1 ,2 ]
van der Gouw, Nikki [2 ]
Larson, Martha [1 ,2 ]
Marchiori, Elena [2 ]
机构
[1] Delft Univ Technol, Multimedia Comp Grp, Delft, Netherlands
[2] Radboud Univ Nijmegen, Nijmegen, Netherlands
来源
关键词
Deep neural networks; Speech representations; Visualizations;
D O I
10.1007/978-3-030-05716-9_16
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naive, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.
引用
下载
收藏
页码:194 / 205
页数:12
相关论文
共 50 条
  • [1] The Representation of Speech and Its Processing in the Human Brain and Deep Neural Networks
    Scharenborg, Odette
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 1 - 8
  • [2] COLOR REPRESENTATION IN DEEP NEURAL NETWORKS
    Engilberge, Martin
    Collins, Edo
    Susstrunk, Sabine
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2786 - 2790
  • [3] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [4] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [5] Deep Segmental Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    Jiang, Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
  • [6] Visual representation of the speech trace using neural networks
    Gomez, P
    Rodellar, V
    Alvarez, A
    Mayo, N
    Rubio, F
    Nieto, V
    Perez, MM
    ISCAS 96: 1996 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - CIRCUITS AND SYSTEMS CONNECTING THE WORLD, VOL 3, 1996, : 586 - 589
  • [7] Deep Neural Networks in Russian Speech Recognition
    Markovnikov, Nikita
    Kipyatkova, Irina
    Karpov, Alexey
    Filchenkov, Andrey
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
  • [8] Speech watermarking using Deep Neural Networks
    Pavlovic, Kosta
    Kovacevic, Slavko
    Durovic, Igor
    2020 28TH TELECOMMUNICATIONS FORUM (TELFOR), 2020, : 292 - 295
  • [9] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Cai, Meng
    Shi, Yongzhe
    Liu, Jia
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
  • [10] Binary Deep Neural Networks for Speech Recognition
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537