Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement

被引:2
|
作者
Tuan Vu Ho [1 ]
Quoc Huy Nguyen [1 ]
Akagi, Masato [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Japan
来源
关键词
Speech enhancement; vector-quantized variational autoencoder; complex Wiener filter; noise reduction; NOISE;
D O I
10.21437/Interspeech.2022-443
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the state-of-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
引用
收藏
页码:176 / 180
页数:5
相关论文
共 50 条
  • [1] Vector-Quantized Variational AutoEncoder for pansharpening
    Talbi, Farid
    Elmezouar, Miloud Chikr
    Boutellaa, Elhocine
    Alim, Fatiha
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (20) : 6329 - 6349
  • [2] Bone-conducted Speech Enhancement Using Vector-quantized Variational Autoencoder and Gammachirp Filterbank Cepstral Coefficients
    Quoc-Huy Nguyen
    Unoki, Masashi
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 21 - 25
  • [3] Leveraging Vector-Quantized Variational Autoencoder Inner Metrics for Anomaly Detection
    Gangloff, Hugo
    Pham, Minh-Tan
    Courtrai, Luc
    Lefevre, Sebastien
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 435 - 441
  • [4] Vector-Quantized Autoencoder With Copula for Collaborative Filtering
    Wang, Guanyu
    Zhong, Ting
    Xu, Xovee
    Zhang, Kunpeng
    Zhou, Fan
    Wang, Yong
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3458 - 3462
  • [5] Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image Inpainting
    Li, Cheng
    Xu, Dan
    Chen, Kuai
    [J]. ELECTRONICS, 2024, 13 (10)
  • [6] Investigation on the Band Importance of Phase-aware Speech Enhancement
    Zhang, Zhuohuang
    Williamson, Donald S.
    Shen, Yi
    [J]. INTERSPEECH 2022, 2022, : 4651 - 4655
  • [7] Phase-Aware Single-channel Speech Enhancement
    Mowlaee, Pejman
    Watanabe, Mario Kaoru
    Saeidi, Rahim
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1871 - 1873
  • [8] Phase-Aware Speech Enhancement With Complex Wiener Filter
    Nguyen, Huy
    Ho, Tuan Vu
    Akagi, Masato
    Unoki, Masashi
    [J]. IEEE ACCESS, 2023, 11 : 141573 - 141584
  • [9] VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT WITH A NOISE-AWARE ENCODER
    Fang, Huajian
    Carbajal, Guillaume
    Wermter, Stefan
    Gerkmann, Timo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 676 - 680
  • [10] Phase-Aware Speech Enhancement Based on Deep Neural Networks
    Zheng, Naijun
    Zhang, Xiao-Lei
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 63 - 76