Phase-Aware Speech Enhancement With Complex Wiener Filter

被引:0
|
作者
Nguyen, Huy [1 ]
Ho, Tuan Vu [2 ]
Akagi, Masato [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol JAIST, Grad Sch Adv Sci & Technol, Nomi, Ishikawa 9231292, Japan
[2] Hitachi Ltd, Adv Artificial Intelligent Innovat Ctr, Media Intelligent Proc Reseach Dept, Tokyo 1858601, Japan
关键词
Speech enhancement; complex Wiener filter; vector-quantized variational autoencoder; noise reduction; spectral fine structure enhancement; F0; distribution; NOISE; MASK;
D O I
10.1109/ACCESS.2023.3341919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In speech enhancement, accurate phase reconstruction can significantly improve speech quality. While phase-aware speech enhancement methods using the complex ideal ratio mask (cIRM) have shown promise, the estimation difficulty of the phase is shared with the real and imaginary parts of the cIRM. The pattern lacking in the imaginary part poses particular difficulties. To address this issue, we proposed a phase-aware speech enhancement method that uses a complex Wiener filter, which delegates the estimation of speech and noise amplitude properties and the phase property to different models, mitigating the issues with the cIRM and improving the effectiveness of neural-network training. Our method uses a speech-variance estimation model with a noise-robust vector-quantized variational autoencoder and a phase corrector that maximizes the scale-invariant signal-to-noise ratio in the time domain. To further improve speech-variance estimation, we propose a loss function that uses a categorical distribution of fundamental frequency (F0) for enhancing the spectral fine structure of estimated speech variance. We evaluated our method on the open dataset released by Valentini et al. to directly compare it with other speech-enhancement methods. Our method achieved a perceptual evaluation of speech quality score of 2.86 and short-time objective intelligibility score of 0.94, better than the state-of-the-art method based on cIRM estimation during the 2020 Deep Noise Challenge. Our comprehensive analysis shows that incorporating the proposed loss function for spectral-fine-structure enhancement improves speech quality, especially when the F0 is low.
引用
收藏
页码:141573 / 141584
页数:12
相关论文
共 50 条
  • [1] Funnel Deep Complex U-net for Phase-Aware Speech Enhancement
    Sun, Yuhang
    Yang, Linju
    Zhu, Huifeng
    Hao, Jie
    [J]. INTERSPEECH 2021, 2021, : 161 - 165
  • [2] Investigation on the Band Importance of Phase-aware Speech Enhancement
    Zhang, Zhuohuang
    Williamson, Donald S.
    Shen, Yi
    [J]. INTERSPEECH 2022, 2022, : 4651 - 4655
  • [3] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    [J]. INTERSPEECH 2020, 2020, : 2472 - 2476
  • [4] Phase-Aware Single-channel Speech Enhancement
    Mowlaee, Pejman
    Watanabe, Mario Kaoru
    Saeidi, Rahim
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1871 - 1873
  • [5] Phase-Aware Speech Enhancement Based on Deep Neural Networks
    Zheng, Naijun
    Zhang, Xiao-Lei
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 63 - 76
  • [6] On Speech Intelligibility Estimation of Phase-Aware Single-Channel Speech Enhancement
    Gaich, Andreas
    Mowlaee, Pejman
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2553 - 2557
  • [7] PACDNN: A phase-aware composite deep neural network for speech enhancement
    Hasannezhad, Mojtaba
    Yu, Hongjiang
    Zhu, Wei-Ping
    Champagne, Benoit
    [J]. SPEECH COMMUNICATION, 2022, 136 : 1 - 13
  • [8] Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
    Tuan Vu Ho
    Quoc Huy Nguyen
    Akagi, Masato
    Unoki, Masashi
    [J]. INTERSPEECH 2022, 2022, : 176 - 180
  • [9] A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios
    Krawczyk-Becker, Martin
    Gerkmann, Timo
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 407 - 416
  • [10] ON SPEECH QUALITY ES TIMATION OF PHASE-AWARE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Gaich, Andreas
    Mowlaee, Pejman
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 216 - 220