The Representation of Speech in Deep Neural Networks

被引：2

作者：

Scharenborg, Odette ^{[1
,2
]}

van der Gouw, Nikki ^{[2
]}

Larson, Martha ^{[1
,2
]}

Marchiori, Elena ^{[2
]}

机构：

[1] Delft Univ Technol, Multimedia Comp Grp, Delft, Netherlands

[2] Radboud Univ Nijmegen, Nijmegen, Netherlands

来源：

MULTIMEDIA MODELING, MMM 2019, PT II | 2019年 / 11296卷

关键词：

Deep neural networks; Speech representations; Visualizations;

D O I：

10.1007/978-3-030-05716-9_16

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naive, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.

引用

页码：194 / 205

页数：12

共 50 条

[41] Predicting EEG Responses to Attended Speech via Deep Neural Networks for Speech
Alickovic, Emina
Dorszewski, Tobias
Christiansen, Thomas U.
Eskelund, Kasper
Gizzi, Leonardo
Skoglund, Martin A.
Wendt, Dorothea
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
[42] Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks
Wani, Taiba Majid
Gunawan, Teddy Surya
Qadri, Syed Asif Ahmad
Mansor, Hasmah
Kartiwi, Mira
Ismail, Nanang
PROCEEDING OF 2020 6TH INTERNATIONAL CONFERENCE ON WIRELESS AND TELEMATICS (ICWT), 2020,
[43] EXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
He, Tianxing
Droppo, Jasha
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5445 - 5449
[44] Binaural reverberant Speech separation based on deep neural networks
Zhang, Xueliang
Wang, DeLiang
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
[45] Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension
Lee, Bong-Ki
Noh, Kyounjin
Chang, Joon-Hyuk
Choo, Kihyun
Oh, Eunmi
IEEE ACCESS, 2018, 6 : 27039 - 27047
[46] VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Qian, Yanmin
Woodland, Philip C.
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 481 - 488
[47] An Experimental Study on Speech Enhancement Based on Deep Neural Networks
Xu, Yong
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (01) : 65 - 68
[48] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
Mamyrbayev, Orken
Turdalyuly, Mussa
Mekebayev, Nurbapa
Alimhan, Keylan
Kydyrbekova, Aizat
Turdalykyzy, Tolganay
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
[49] Helium Speech Correction Algorithm Based on Deep Neural Networks
Li, Dongmei
Zhang, Shibing
Guo, Lili
Chen, Yonghong
2020 12TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2020, : 99 - 103
[50] A Regression Approach to Speech Enhancement Based on Deep Neural Networks
Xu, Yong
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 7 - 19

← 1 2 3 4 5 →