Convolutional Neural Turing Machine for Speech Separation

被引：0

作者：

Chien, Jen-Tzung ^{[1
]}

Tsou, Kai-Wei ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan

来源：

2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年

关键词：

Recurrent neural network; convolutional neural network; neural Turing machine; monaural speech separation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Long short-term memory (LSTM) has been successfully developed for monaural speech separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The spectro-temporal data matrix of mixed signal is flattened as input vectors. There are twofold limitations. First, the internal memory in LSTM could not sufficiently characterize long-term information from different sources. Second, the temporal correlation and frequency neighboring in the flattened vectors were smeared. To deal with these limitations, this paper presents a convolutional neural Turing machine (ConvNTM) where the feature maps of spectro-temporal data are extracted and embedded in an external memory at each time step. ConvNTM aims to preserve the spectro-temporal structure in long sequential signals which is exploited to estimate the separated spectral signals. An addressing mechanism is introduced to continuously calculate the read and write heads to retrieve and update memory slots, respectively. The memory augmented source separation is implemented for single-channel speech enhancement. Experimental results illustrate the superiority of ConvNTM to LSTM, NTM and convolutional LSTM for speech enhancement in terms of short-term objective intelligibility measure.

引用

页码：81 / 85

页数：5

共 50 条

[21] Continuous speech recognition by convolutional neural networks
Zhang, Qing-Qing
Liu, Yong
Pan, Jie-Lin
Yan, Yong-Hong
Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2015, 37 (09): : 1212 - 1217
[22] Convolutional Neural Networks for Distant Speech Recognition
Swietojanski, Pawel
Ghoshal, Arnab
Renals, Steve
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
[23] Implementation of Convolutional Neural Network for Speech Recognition
Wang, Zhichao
Na, Xingyu
Liu, Yong
Pan, Jielin
Yan, Yonghong
INTERNATIONAL ACADEMIC CONFERENCE ON THE INFORMATION SCIENCE AND COMMUNICATION ENGINEERING (ISCE 2014), 2014, : 239 - 243
[24] AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION
Huang, Jui-Ting
Li, Jinyu
Gong, Yifan
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4989 - 4993
[25] Speech Recognition Based on Convolutional Neural Networks
Du Guiming
Wang Xia
Wang Guangyan
Zhang Yan
Li Dan
2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
[26] Dynamic neural turing machine with continuous and discrete addressing schemes
Gulcehre C.
Chandar S.
Cho K.
Bengio Y.
2018, MIT Press Journals (30) : 857 - 884
[27] Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes
Gulcehre, Caglar
Chandar, Sarath
Cho, Kyunghyun
Bengio, Yoshua
NEURAL COMPUTATION, 2018, 30 (04) : 857 - 884
[28] Neural Turing Machine for Sequential Learning of Human Mobility Patterns
Tkacik, Ian
Kordik, Pavel
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2790 - 2797
[29] The Turing machine
P. S. Thiagarajan
Resonance, 1997, 2 (7) : 3 - 4
[30] The turing machine
不详
HISTORIA, 2018, (862): : 74 - 74

← 1 2 3 4 5 →