Convolutional Neural Turing Machine for Speech Separation

被引：0

作者：

Chien, Jen-Tzung ^{[1
]}

Tsou, Kai-Wei ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan

来源：

2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年

关键词：

Recurrent neural network; convolutional neural network; neural Turing machine; monaural speech separation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Long short-term memory (LSTM) has been successfully developed for monaural speech separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The spectro-temporal data matrix of mixed signal is flattened as input vectors. There are twofold limitations. First, the internal memory in LSTM could not sufficiently characterize long-term information from different sources. Second, the temporal correlation and frequency neighboring in the flattened vectors were smeared. To deal with these limitations, this paper presents a convolutional neural Turing machine (ConvNTM) where the feature maps of spectro-temporal data are extracted and embedded in an external memory at each time step. ConvNTM aims to preserve the spectro-temporal structure in long sequential signals which is exploited to estimate the separated spectral signals. An addressing mechanism is introduced to continuously calculate the read and write heads to retrieve and update memory slots, respectively. The memory augmented source separation is implemented for single-channel speech enhancement. Experimental results illustrate the superiority of ConvNTM to LSTM, NTM and convolutional LSTM for speech enhancement in terms of short-term objective intelligibility measure.

引用

页码：81 / 85

页数：5

共 50 条

[31] Deep Attractor with Convolutional Network for Monaural Speech Separation
Lan, Tian
Qian, Yuxin
Tai, Wenxin
Chu, Boce
Liu, Qiao
2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 40 - 44
[32] THE TURING MACHINE
MALITZ, I
BYTE, 1987, 12 (13): : 345 - &
[33] On Tight Separation for Blum Measures Applied to Turing Machine Buffer Complexity
Sima, Jiri
Zak, Stanislav
FUNDAMENTA INFORMATICAE, 2017, 152 (04) : 397 - 409
[34] A Method of Speech Coding for Speech Recognition Using a Convolutional Neural Network
Kubanek, Mariusz
Bobulski, Janusz
Kulawik, Joanna
SYMMETRY-BASEL, 2019, 11 (09): : 1 - 12
[35] A Simple Universal Turing Machine for the Game of Life Turing Machine
Rendell, P.
JOURNAL OF CELLULAR AUTOMATA, 2011, 6 (4-5) : 323 - 340
[36] Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network
Baloch, Danish
Abdullah, Sidrah
Qaiser, Asma
Ahmed, Saad
Nasim, Faiza
Kanwal, Mehreen
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 831 - 836
[37] Crossmixed convolutional neural network for digital speech recognition
Diep, Quoc Bao
Phan, Hong Yen
Truong, Thanh-Cong
PLOS ONE, 2024, 19 (04):
[38] Speech recognition in noisy environments with Convolutional Neural Networks
Santos, Rafael M.
Matos, Leonardo N.
Macedo, Hendrik T.
Montalvao, Jugurta
2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, : 175 - 179
[39] Deep Convolutional Neural Network for Arabic Speech Recognition
Amari, Rafik
Noubigh, Zouhaira
Zrigui, Salah
Berchech, Dhaou
Nicolas, Henri
Zrigui, Mounir
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 120 - 134
[40] Continuous Speech Emotion Recognition with Convolutional Neural Networks
Vryzas, Nikolaos
Vrysis, Lazaros
Matsiola, Maria
Kotsakis, Rigas
Dimoulas, Charalampos
Kalliris, George
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24

← 1 2 3 4 5 →