A two-stage phase-aware approach for monaural multi-talker speech separation

被引：0

作者：

Yin L. ^{[1
,2
]}

Li J. ^{[1
,2
]}

Yan Y. ^{[1
,2
,3
]}

Akagi M. ^{[4
]}

机构：

[1] University of Chinese Academy of Sciences, Beijing

[2] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing

[3] Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Xinjiang

[4] Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi-shi

来源：

IEICE Transactions on Information and Systems | 2020年 / E103.D卷 / 07期

关键词：

Amplitude estimation; Deep learning; Mask estimation; Phase recovery; Speech separation;

D O I：

10.1587/TRANSINF.2019EDP7259

中图分类号：

学科分类号：

摘要：

The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech. Copyright © 2020 The Institute of Electronics, Information and Communication Engineers

引用

页码：1732 / 1743

页数：11

共 50 条

[41] Robust speech separation using two-stage Independent Component Analysis
Aarabi, P
Mavandadi, S
FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, 2003, : 1070 - 1077
[42] Two-stage Underdetermined Speech Source Separation using Frequency Normalization
Reddy, V. V.
Sattar, F.
Ng, B. P.
Driessen, P. F.
2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 648 - 653
[43] A TWO-STAGE APPROACH FOR IMPROVING THE PERCEPTUAL QUALITY OF SEPARATED SPEECH
Williamson, Donald S.
Wang, Yuxuan
Wang, DeLiang
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[44] Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network
Tan, Ke
Xu, Yong
Zhang, Shi-Xiong
Yu, Meng
Yu, Dong
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 542 - 553
[45] A blind channel identification-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment
Huang, YTA
Benesty, J
Chen, JD
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 882 - 895
[46] A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech
Li, Changtao
Yang, Feiran
Yang, Jun
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 818 - 829
[47] Two-stage phase separation in ternary colloid-polymer mixtures
Zhou, Juan
van Duijneveldt, Jeroen S.
Vincent, Brian
PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2011, 13 (01) : 110 - 113
[48] A two-stage clustering approach for multi-region segmentation
Mo, Jiahui
Kiang, Melody Y.
Zou, Peng
Li, Yijun
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (10) : 7120 - 7131
[49] A TWO-STAGE SINGLE-CHANNEL SPEAKER-DEPENDENT SPEECH SEPARATION APPROACH FOR CHIME-5 CHALLENGE
Sun, Lei
Du, Jun
Gao, Tian
Fang, Yi
Ma, Feng
Pan, Jia
Lee, Chin-Hui
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6650 - 6654
[50] A two-stage multimodal speaker location-aware approach in pervasive computing
Xiao, Ruo-gui
Guo, Tong-qiang
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2010, 38 (1-3) : 118 - 123

← 1 2 3 4 5 →