Convolutional Neural Turing Machine for Speech Separation

被引:0
|
作者
Chien, Jen-Tzung [1 ]
Tsou, Kai-Wei [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
关键词
Recurrent neural network; convolutional neural network; neural Turing machine; monaural speech separation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long short-term memory (LSTM) has been successfully developed for monaural speech separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The spectro-temporal data matrix of mixed signal is flattened as input vectors. There are twofold limitations. First, the internal memory in LSTM could not sufficiently characterize long-term information from different sources. Second, the temporal correlation and frequency neighboring in the flattened vectors were smeared. To deal with these limitations, this paper presents a convolutional neural Turing machine (ConvNTM) where the feature maps of spectro-temporal data are extracted and embedded in an external memory at each time step. ConvNTM aims to preserve the spectro-temporal structure in long sequential signals which is exploited to estimate the separated spectral signals. An addressing mechanism is introduced to continuously calculate the read and write heads to retrieve and update memory slots, respectively. The memory augmented source separation is implemented for single-channel speech enhancement. Experimental results illustrate the superiority of ConvNTM to LSTM, NTM and convolutional LSTM for speech enhancement in terms of short-term objective intelligibility measure.
引用
收藏
页码:81 / 85
页数:5
相关论文
共 50 条
  • [21] Continuous speech recognition by convolutional neural networks
    Zhang, Qing-Qing
    Liu, Yong
    Pan, Jie-Lin
    Yan, Yong-Hong
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2015, 37 (09): : 1212 - 1217
  • [22] Convolutional Neural Networks for Distant Speech Recognition
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1120 - 1124
  • [23] Implementation of Convolutional Neural Network for Speech Recognition
    Wang, Zhichao
    Na, Xingyu
    Liu, Yong
    Pan, Jielin
    Yan, Yonghong
    INTERNATIONAL ACADEMIC CONFERENCE ON THE INFORMATION SCIENCE AND COMMUNICATION ENGINEERING (ISCE 2014), 2014, : 239 - 243
  • [24] AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION
    Huang, Jui-Ting
    Li, Jinyu
    Gong, Yifan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4989 - 4993
  • [25] Speech Recognition Based on Convolutional Neural Networks
    Du Guiming
    Wang Xia
    Wang Guangyan
    Zhang Yan
    Li Dan
    2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
  • [26] Dynamic neural turing machine with continuous and discrete addressing schemes
    Gulcehre C.
    Chandar S.
    Cho K.
    Bengio Y.
    2018, MIT Press Journals (30) : 857 - 884
  • [27] Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes
    Gulcehre, Caglar
    Chandar, Sarath
    Cho, Kyunghyun
    Bengio, Yoshua
    NEURAL COMPUTATION, 2018, 30 (04) : 857 - 884
  • [28] Neural Turing Machine for Sequential Learning of Human Mobility Patterns
    Tkacik, Ian
    Kordik, Pavel
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2790 - 2797
  • [29] The Turing machine
    P. S. Thiagarajan
    Resonance, 1997, 2 (7) : 3 - 4
  • [30] The turing machine
    不详
    HISTORIA, 2018, (862): : 74 - 74