A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks

被引：76

作者：

Du, Jun ^{[1
]}

Tu, Yanhui ^{[1
]}

Dai, Li-Rong ^{[1
]}

Lee, Chin-Hui ^{[2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Peoples R China

[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2016年 / 24卷 / 08期

关键词：

Deep neural network; divide and conquer; dual outputs; robust speech recognition; speech separation; ALGORITHM; CASA;

D O I：

10.1109/TASLP.2016.2558822

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a novel data-driven approach to single-channel speech separation based on deep neural networks (DNNs) to directly model the highly nonlinear relationship between speech features of a mixed signal containing a target speaker and other interfering speakers. We focus our discussion on a semisupervised mode to separate speech of the target speaker from an unknown interfering speaker, which is more flexible than the conventional supervised mode with known information of both the target and interfering speakers. Two key issues are investigated. First, we propose a DNN architecture with dual outputs of the features of both the target and interfering speakers, which is shown to achieve a better generalization capability than that with output features of only the target speaker. Second, we propose using a set of multiple DNNs, each intending to be signal-noise-dependent (SND), to cope with the difficulty that one single general DNN could not well accommodate all the speaker mixing variabilities at different signal-to-noise ratio (SNR) levels. Experimental results on the speech separation challenge (SSC) data demonstrate that our proposed framework achieves better separation results than other conventional approaches in a supervised or semisupervised mode. SND-DNNs could also yield significant performance improvements over a general DNN for speech separation in low SNR cases. Furthermore, for automatic speech recognition (ASR) following speech separation, this purely front-end processing with a single set of speaker-independent ASR acoustic models, achieves a relative word error rate (WER) reduction of 11.6% over a state-of-the-art separation and recognition system where a complicated joint back-end decoding framework with multiple sets of speaker-dependent ASR acoustic models needs to be implemented. When speaker-adaptive ASR acoustic models for the target speakers are adopted for the enhanced signals, another 12.1% WER reduction over our best speaker-independent ASR system is achieved.

引用

页码：1424 / 1437

页数：14

共 50 条

[1] A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks
Wang, Yannan
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1535 - 1546
[2] Unsupervised Single-Channel Speech Separation via Deep Neural Network for Different Gender Mixtures
Wang, Yannan
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[3] Discriminatively Trained Recurrent Neural Networks for Single-Channel Speech Separation
Weninger, Felix
Hershey, John R.
Le Roux, Jonathan
Schuller, Bjoern
[J]. 2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 577 - 581
[4] Perceptual Weighting Deep Neural Networks for Single-channel Speech Enhancement
Han, Wei
Zhang, Xiongwei
Min, Gang
Zhou, Xingyu
Zhang, Wei
[J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 446 - 450
[5] SINGLE-CHANNEL MIXED SPEECH RECOGNITION USING DEEP NEURAL NETWORKS
Weng, Chao
Yu, Dong
Seltzer, Michael L.
Droppo, Jasha
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[6] Ensemble System of Deep Neural Networks for Single-Channel Audio Separation
Al-Kaltakchi, Musab T. S.
Mohammad, Ahmad Saeed
Woo, Wai Lok
[J]. INFORMATION, 2023, 14 (07)
[7] SINGLE-CHANNEL SPEECH SEPARATION WITH MEMORY-ENHANCED RECURRENT NEURAL NETWORKS
Weninger, Felix
Eyben, Florian
Schuller, Bjoern
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[8] Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
Weng, Chao
Yu, Dong
Seltzer, Michael L.
Droppo, Jasha
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1670 - 1679
[9] Single-channel Speech Separation based on Gaussian Process Regression
Le Dinh Nguyen
Chen, Sih-Huei
Tai, Tzu-Chiang
Wang, Jia-Ching
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 275 - 278
[10] Linear regression on sparse features for single-channel speech separation
Schmidt, Mikkel N.
Olsson, Rasmus K.
[J]. 2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2007, : 149 - 152

← 1 2 3 4 5 →