Convolutional Neural Turing Machine for Speech Separation

被引:0
|
作者
Chien, Jen-Tzung [1 ]
Tsou, Kai-Wei [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
关键词
Recurrent neural network; convolutional neural network; neural Turing machine; monaural speech separation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long short-term memory (LSTM) has been successfully developed for monaural speech separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The spectro-temporal data matrix of mixed signal is flattened as input vectors. There are twofold limitations. First, the internal memory in LSTM could not sufficiently characterize long-term information from different sources. Second, the temporal correlation and frequency neighboring in the flattened vectors were smeared. To deal with these limitations, this paper presents a convolutional neural Turing machine (ConvNTM) where the feature maps of spectro-temporal data are extracted and embedded in an external memory at each time step. ConvNTM aims to preserve the spectro-temporal structure in long sequential signals which is exploited to estimate the separated spectral signals. An addressing mechanism is introduced to continuously calculate the read and write heads to retrieve and update memory slots, respectively. The memory augmented source separation is implemented for single-channel speech enhancement. Experimental results illustrate the superiority of ConvNTM to LSTM, NTM and convolutional LSTM for speech enhancement in terms of short-term objective intelligibility measure.
引用
收藏
页码:81 / 85
页数:5
相关论文
共 50 条
  • [41] CONVOLUTIONAL-RECURRENT NEURAL NETWORKS FOR SPEECH ENHANCEMENT
    Zhao, Han
    Zarar, Shuayb
    Tashev, Ivan
    Lee, Chin-Hui
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2401 - 2405
  • [42] Continuous speech emotion recognition with convolutional neural networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
  • [43] Continuous Speech Recognition based on Convolutional Neural Network
    Zhang, Qing-qing
    Liu, Yong
    Pan, Jie-lin
    Yan, Yong-hong
    SEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2015), 2015, 9631
  • [44] Recognizing Speech Commands Using Convolutional Neural Network
    Kubanek, Mariusz
    Bobulski, Janusz
    Kulawik, Joanna
    INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS ICNAAM 2019, 2020, 2293
  • [45] Speech Signal Classification Based on Convolutional Neural Networks
    Zhang, Xiaomeng
    Sun, Hao
    Wang, Shuopeng
    Xu, Jing
    COGNITIVE SYSTEMS AND SIGNAL PROCESSING, PT II, 2019, 1006 : 281 - 287
  • [46] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
    Parthasarathy, Srinivas
    Tashev, Ivan
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125
  • [47] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [48] Deep convolutional neural network for detection of pathological speech
    Vavrek, Lukas
    Hires, Mate
    Kumar, Dinesh
    Drotar, Peter
    2021 IEEE 19TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2021), 2021, : 245 - 249
  • [49] Convolutional Neural Networks for Speech Controlled Prosthetic Hands
    Jafarzadeh, Mohsen
    Tadesse, Yonas
    2019 FIRST INTERNATIONAL CONFERENCE ON TRANSDISCIPLINARY AI (TRANSAI 2019), 2019, : 35 - 42
  • [50] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775