Information encoding by deep neural networks: what can we learn?

被引:3
|
作者
ten Bosch, L. [1 ,2 ]
Boves, L. [1 ]
机构
[1] Radboud Univ Nijmegen, Nijmegen, Netherlands
[2] Max Planck Inst Psycholinguist, Nijmegen, Netherlands
关键词
deep neural networks; conventional knowledge; information encoding; structure discovery;
D O I
10.21437/Interspeech.2018-1896
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent advent of deep learning techniques in speech technology and in particular in automatic speech recognition has yielded substantial performance improvements. This suggests that deep neural networks (DNNs) are able to capture structure in speech data that older methods for acoustic modeling, such as Gaussian Mixture Models and shallow neural networks fail to uncover. In image recognition it is possible to link representations on the first couple of layers in DNNs to structural properties of images, and to representations on early layers in the visual cortex. This raises the question whether it is possible to accomplish a similar feat with representations on DNN layers when processing speech input. In this paper we present three different experiments in which we attempt to untangle how DNNs encode speech signals, and to relate these representations to phonetic knowledge, with the aim to advance conventional phonetic concepts and to choose the topology of a DNNs more efficiently. Two experiments investigate representations formed by auto-encoders. A third experiment investigates representations on convolutional layers that treat speech spectrograms as if they were images. The results lay the basis for future experiments with recursive networks.
引用
收藏
页码:1457 / 1461
页数:5
相关论文
共 50 条