The Representation of Speech in Deep Neural Networks

被引：2

作者：

Scharenborg, Odette ^{[1
,2
]}

van der Gouw, Nikki ^{[2
]}

Larson, Martha ^{[1
,2
]}

Marchiori, Elena ^{[2
]}

机构：

[1] Delft Univ Technol, Multimedia Comp Grp, Delft, Netherlands

[2] Radboud Univ Nijmegen, Nijmegen, Netherlands

来源：

MULTIMEDIA MODELING, MMM 2019, PT II | 2019年 / 11296卷

关键词：

Deep neural networks; Speech representations; Visualizations;

D O I：

10.1007/978-3-030-05716-9_16

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we investigate the connection between how people understand speech and how speech is understood by a deep neural network. A naive, general feed-forward deep neural network was trained for the task of vowel/consonant classification. Subsequently, the representations of the speech signal in the different hidden layers of the DNN were visualized. The visualizations allow us to study the distance between the representations of different types of input frames and observe the clustering structures formed by these representations. In the different visualizations, the input frames were labeled with different linguistic categories: sounds in the same phoneme class, sounds with the same manner of articulation, and sounds with the same place of articulation. We investigate whether the DNN clusters speech representations in a way that corresponds to these linguistic categories and observe evidence that the DNN does indeed appear to learn structures that humans use to understand speech without being explicitly trained to do so.

引用

下载

页码：194 / 205

页数：12

共 50 条

[21] Speech Activity Detection Using Deep Neural Networks
Shahsavari, Sajad
Sameti, Hossein
Hadian, Hossein
2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1564 - 1568
[22] Noisy training for deep neural networks in speech recognition
Shi Yin
Chao Liu
Zhiyong Zhang
Yiye Lin
Dong Wang
Javier Tejedor
Thomas Fang Zheng
Yinguo Li
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[23] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Ravanelli, Mirco
Brakel, Philemon
Omologo, Maurizio
Bengio, Yoshua
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
[24] Survey on Deep Neural Networks in Speech and Vision Systems
Alam, M.
Samad, M. D.
Vidyaratne, L.
Glandon, A.
Iftekharuddin, K. M.
NEUROCOMPUTING, 2020, 417 : 302 - 321
[25] SYNAPTIC DEPRESSION IN DEEP NEURAL NETWORKS FOR SPEECH PROCESSING
Zhang, Wenhao
Li, Hanyu
Yang, Minda
Mesgarani, Nima
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5865 - 5869
[26] Speech bandwidth expansion based on Deep Neural Networks
Wang, Yingxue
Zhao, Shenghui
Liu, Wenbo
Li, Ming
Kuang, Jingming
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2593 - 2597
[27] Mongolian Speech Recognition Based on Deep Neural Networks
Zhang, Hui
Bao, Feilong
Gao, Guanglai
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
[28] Emotional Speech Recognition Using Deep Neural Networks
Trinh Van, Loan
Dao Thi Le, Thuy
Le Xuan, Thanh
Castelli, Eric
SENSORS, 2022, 22 (04)
[29] SPEECH ENHANCEMENT USING MULTIPLE DEEP NEURAL NETWORKS
Karjol, Pavan
Kumar, Ajay M.
Ghosh, Prasanta Kumar
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5049 - 5053
[30] DEEP NEURAL NETWORKS FOR ESTIMATING SPEECH MODEL ACTIVATIONS
Williamson, Donald S.
Wang, Yuxuan
Wang, DeLiang
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5113 - 5117

← 1 2 3 4 5 →