Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement

被引:2
|
作者
Tuan Vu Ho [1 ]
Quoc Huy Nguyen [1 ]
Akagi, Masato [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Japan
来源
关键词
Speech enhancement; vector-quantized variational autoencoder; complex Wiener filter; noise reduction; NOISE;
D O I
10.21437/Interspeech.2022-443
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the state-of-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
引用
收藏
页码:176 / 180
页数:5
相关论文
共 50 条
  • [41] Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature
    Du, Chenpeng
    Guo, Yiwei
    Chen, Xie
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3446 - 3456
  • [42] Maximum a posteriori estimation of spectral gain with harmonic-structure-based phase reconstruction for phase-aware speech enhancement
    Wakabayashi, Yukoh
    Ono, Nobutaka
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1649 - 1652
  • [43] Sub-band Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 296 - 300
  • [45] Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
    Agrawal, Vikas
    Kumar, Shashi
    Rath, Shakti P.
    INTERSPEECH 2021, 2021, : 2706 - 2710
  • [46] Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal
    Samui, Suman
    Chakrabarti, Indrajit
    Ghosh, Soumya Kanti
    IET SIGNAL PROCESSING, 2016, 10 (06) : 641 - 650
  • [47] MONAURAL SPEECH SEPARATION USING A PHASE-AWARE DEEP DENOISING AUTO ENCODER
    Williamson, Donald S.
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [48] TVQVC: Transformer based Vector Quantized Variational Autoencoder with CTC loss for Voice Conversion
    Chen, Ziyi
    Zhang, Pengyuan
    INTERSPEECH 2021, 2021, : 826 - 830
  • [49] Data augmentation for Gram-stain images based on Vector Quantized Variational AutoEncoder
    Shwetha, V
    Prasad, Keerthana
    Mukhopadhyay, Chiranjay
    Banerjee, Barnini
    NEUROCOMPUTING, 2024, 600
  • [50] DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
    Liu, Yanqing
    Xue, Ruiqing
    He, Lei
    Tan, Xu
    Zhao, Sheng
    INTERSPEECH 2022, 2022, : 1581 - 1585