Convolutional Neural Turing Machine for Speech Separation

被引:0
|
作者
Chien, Jen-Tzung [1 ]
Tsou, Kai-Wei [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
关键词
Recurrent neural network; convolutional neural network; neural Turing machine; monaural speech separation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Long short-term memory (LSTM) has been successfully developed for monaural speech separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The spectro-temporal data matrix of mixed signal is flattened as input vectors. There are twofold limitations. First, the internal memory in LSTM could not sufficiently characterize long-term information from different sources. Second, the temporal correlation and frequency neighboring in the flattened vectors were smeared. To deal with these limitations, this paper presents a convolutional neural Turing machine (ConvNTM) where the feature maps of spectro-temporal data are extracted and embedded in an external memory at each time step. ConvNTM aims to preserve the spectro-temporal structure in long sequential signals which is exploited to estimate the separated spectral signals. An addressing mechanism is introduced to continuously calculate the read and write heads to retrieve and update memory slots, respectively. The memory augmented source separation is implemented for single-channel speech enhancement. Experimental results illustrate the superiority of ConvNTM to LSTM, NTM and convolutional LSTM for speech enhancement in terms of short-term objective intelligibility measure.
引用
收藏
页码:81 / 85
页数:5
相关论文
共 50 条
  • [31] Deep Attractor with Convolutional Network for Monaural Speech Separation
    Lan, Tian
    Qian, Yuxin
    Tai, Wenxin
    Chu, Boce
    Liu, Qiao
    2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 40 - 44
  • [32] THE TURING MACHINE
    MALITZ, I
    BYTE, 1987, 12 (13): : 345 - &
  • [33] On Tight Separation for Blum Measures Applied to Turing Machine Buffer Complexity
    Sima, Jiri
    Zak, Stanislav
    FUNDAMENTA INFORMATICAE, 2017, 152 (04) : 397 - 409
  • [34] A Method of Speech Coding for Speech Recognition Using a Convolutional Neural Network
    Kubanek, Mariusz
    Bobulski, Janusz
    Kulawik, Joanna
    SYMMETRY-BASEL, 2019, 11 (09): : 1 - 12
  • [35] A Simple Universal Turing Machine for the Game of Life Turing Machine
    Rendell, P.
    JOURNAL OF CELLULAR AUTOMATA, 2011, 6 (4-5) : 323 - 340
  • [36] Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network
    Baloch, Danish
    Abdullah, Sidrah
    Qaiser, Asma
    Ahmed, Saad
    Nasim, Faiza
    Kanwal, Mehreen
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 831 - 836
  • [37] Crossmixed convolutional neural network for digital speech recognition
    Diep, Quoc Bao
    Phan, Hong Yen
    Truong, Thanh-Cong
    PLOS ONE, 2024, 19 (04):
  • [38] Speech recognition in noisy environments with Convolutional Neural Networks
    Santos, Rafael M.
    Matos, Leonardo N.
    Macedo, Hendrik T.
    Montalvao, Jugurta
    2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
  • [39] Deep Convolutional Neural Network for Arabic Speech Recognition
    Amari, Rafik
    Noubigh, Zouhaira
    Zrigui, Salah
    Berchech, Dhaou
    Nicolas, Henri
    Zrigui, Mounir
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 120 - 134
  • [40] Continuous Speech Emotion Recognition with Convolutional Neural Networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24