A two-stage phase-aware approach for monaural multi-talker speech separation

被引:0
|
作者
Yin L. [1 ,2 ]
Li J. [1 ,2 ]
Yan Y. [1 ,2 ,3 ]
Akagi M. [4 ]
机构
[1] University of Chinese Academy of Sciences, Beijing
[2] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing
[3] Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Xinjiang
[4] Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Nomi-shi
来源
关键词
Amplitude estimation; Deep learning; Mask estimation; Phase recovery; Speech separation;
D O I
10.1587/TRANSINF.2019EDP7259
中图分类号
学科分类号
摘要
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech. Copyright © 2020 The Institute of Electronics, Information and Communication Engineers
引用
收藏
页码:1732 / 1743
页数:11
相关论文
共 50 条
  • [1] A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
    Yin, Lu
    Li, Junfeng
    Yan, Yonghong
    Akagi, Masato
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07): : 1732 - 1743
  • [2] Monaural multi-talker speech recognition using factorial speech processing models
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    SPEECH COMMUNICATION, 2018, 98 : 1 - 16
  • [3] MONAURAL SPEECH SEPARATION USING A PHASE-AWARE DEEP DENOISING AUTO ENCODER
    Williamson, Donald S.
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [4] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 304 - 308
  • [5] Two-Stage Multi-Target Joint Learning for Monaural Speech Separation
    Nie, Shuai
    Liang, Shan
    Xue, Wei
    Zhang, Xueliang
    Liu, Wenju
    Dong, Like
    Yang, Hong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1503 - 1507
  • [6] Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1586 - 1590
  • [7] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
  • [8] Complex ISNMF: A Phase-Aware Model for Monaural Audio Source Separation
    Magron, Paul
    Virtanen, Tuomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 20 - 31
  • [9] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
    Yin, Lu
    Wang, Ziteng
    Xia, Risheng
    Li, Junfeng
    Yan, Yonghong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
  • [10] STREAMING NOISE CONTEXT AWARE ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION IN MULTI-TALKER ENVIRONMENTS
    Caroselli, Joe
    Narayanan, Arun
    Huang, Yiteng
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,