A two-stage phase-aware approach for monaural multi-talker speech separation

被引:0
|
作者
Yin L. [1 ,2 ]
Li J. [1 ,2 ]
Yan Y. [1 ,2 ,3 ]
Akagi M. [4 ]
机构
[1] University of Chinese Academy of Sciences, Beijing
[2] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing
[3] Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Xinjiang
[4] Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi-shi
来源
关键词
Amplitude estimation; Deep learning; Mask estimation; Phase recovery; Speech separation;
D O I
10.1587/TRANSINF.2019EDP7259
中图分类号
学科分类号
摘要
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech. Copyright © 2020 The Institute of Electronics, Information and Communication Engineers
引用
收藏
页码:1732 / 1743
页数:11
相关论文
共 50 条
  • [41] Robust speech separation using two-stage Independent Component Analysis
    Aarabi, P
    Mavandadi, S
    FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, 2003, : 1070 - 1077
  • [42] Two-stage Underdetermined Speech Source Separation using Frequency Normalization
    Reddy, V. V.
    Sattar, F.
    Ng, B. P.
    Driessen, P. F.
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 648 - 653
  • [43] A TWO-STAGE APPROACH FOR IMPROVING THE PERCEPTUAL QUALITY OF SEPARATED SPEECH
    Williamson, Donald S.
    Wang, Yuxuan
    Wang, DeLiang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network
    Tan, Ke
    Xu, Yong
    Zhang, Shi-Xiong
    Yu, Meng
    Yu, Dong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 542 - 553
  • [45] A blind channel identification-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment
    Huang, YTA
    Benesty, J
    Chen, JD
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (05): : 882 - 895
  • [46] A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech
    Li, Changtao
    Yang, Feiran
    Yang, Jun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 818 - 829
  • [47] Two-stage phase separation in ternary colloid-polymer mixtures
    Zhou, Juan
    van Duijneveldt, Jeroen S.
    Vincent, Brian
    PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 2011, 13 (01) : 110 - 113
  • [48] A two-stage clustering approach for multi-region segmentation
    Mo, Jiahui
    Kiang, Melody Y.
    Zou, Peng
    Li, Yijun
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (10) : 7120 - 7131
  • [49] A TWO-STAGE SINGLE-CHANNEL SPEAKER-DEPENDENT SPEECH SEPARATION APPROACH FOR CHIME-5 CHALLENGE
    Sun, Lei
    Du, Jun
    Gao, Tian
    Fang, Yi
    Ma, Feng
    Pan, Jia
    Lee, Chin-Hui
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6650 - 6654
  • [50] A two-stage multimodal speaker location-aware approach in pervasive computing
    Xiao, Ruo-gui
    Guo, Tong-qiang
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2010, 38 (1-3) : 118 - 123