A two-stage phase-aware approach for monaural multi-talker speech separation

被引:0
|
作者
Yin L. [1 ,2 ]
Li J. [1 ,2 ]
Yan Y. [1 ,2 ,3 ]
Akagi M. [4 ]
机构
[1] University of Chinese Academy of Sciences, Beijing
[2] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing
[3] Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Xinjiang
[4] Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi-shi
来源
关键词
Amplitude estimation; Deep learning; Mask estimation; Phase recovery; Speech separation;
D O I
10.1587/TRANSINF.2019EDP7259
中图分类号
学科分类号
摘要
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech. Copyright © 2020 The Institute of Electronics, Information and Communication Engineers
引用
收藏
页码:1732 / 1743
页数:11
相关论文
共 50 条
  • [21] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    Yul, Dang
    Kalbcek, Marten
    Tan, Zheng-Hua
    Jensen, Jesper
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
  • [22] A VISUAL-PILOT DEEP FUSION FOR TARGET SPEECH SEPARATION IN MULTI-TALKER NOISY ENVIRONMENT
    Li, Yun
    Liu, Zhang
    Na, Yueyue
    Wang, Ziteng
    Tian, Biao
    Fu, Qiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4442 - 4446
  • [23] Permutation invariant training of deep models for speaker-independent multi-talker speech separation
    Takahashi, Kohei
    Shiraishi, Toshihiko
    MECHANICAL ENGINEERING JOURNAL, 2023,
  • [24] First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement
    Dang, Feng
    Chen, Hangting
    Hu, Qi
    Zhang, Pengyuan
    Yan, Yonghong
    SPEECH COMMUNICATION, 2023, 146 : 32 - 44
  • [25] Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
    Wang, Jing
    Luo, Yiyu
    Yi, Weiming
    Xie, Xiang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 766 - 777
  • [26] Spatial Separation Benefit for Speech Detection in Multi-Talker Babble-Noise with Different Egocentric Distances
    Andreeva, I. G.
    Dymnikowa, M.
    Gvozdeva, A. P.
    Ogorodnikova, E. A.
    Pak, S. P.
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (03) : 484 - 491
  • [27] Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement
    Li, Andong
    Liu, Wenzhe
    Zheng, Chengshi
    Fan, Cunhang
    Li, Xiaodong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1829 - 1843
  • [28] Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation
    Saleem, Nasir
    Khattak, Muhammad Irfan
    APPLIED ACOUSTICS, 2020, 167
  • [29] JOINT SEPARATION AND DENOISING OF NOISY MULTI-TALKER SPEECH USING RECURRENT NEURAL NETWORKS AND PERMUTATION INVARIANT TRAINING
    Kolbaek, Morten
    Yu, Dong
    Tan, Zheng-Hua
    Jensen, Jesper
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [30] A Two-Stage Approach to Noisy Cochannel Speech Separation with Gated Residual Networks
    Tan, Ke
    Wang, DeLiang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3484 - 3488