Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement

被引:2
|
作者
Tuan Vu Ho [1 ]
Quoc Huy Nguyen [1 ]
Akagi, Masato [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Japan
来源
关键词
Speech enhancement; vector-quantized variational autoencoder; complex Wiener filter; noise reduction; NOISE;
D O I
10.21437/Interspeech.2022-443
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the state-of-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
引用
收藏
页码:176 / 180
页数:5
相关论文
共 50 条
  • [31] A Disentangled Recurrent Variational Autoencoder for Speech Enhancement
    Yan, Hegen
    Lu, Zhihua
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1697 - 1702
  • [32] Predictive Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    [J]. 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [33] Advances in phase-aware signal processing in speech communication
    Mowlaee, Pejman
    Saeidi, Rahim
    Stylianou, Yannis
    [J]. SPEECH COMMUNICATION, 2016, 81 : 1 - 29
  • [34] Phase-Aware Signal Processing for Automatic Speech Recognition
    Fahringer, Johannes
    Schrank, Tobias
    Stahl, Johannes
    Mowlaee, Pejman
    Pernkopf, Franz
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3374 - 3378
  • [35] Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering
    Dionelis, Nikolaos
    Brookes, Mike
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) : 937 - 950
  • [36] Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder
    Bando, Yoshiaki
    Sekiguchi, Kouhei
    Yoshii, Kazuyoshi
    [J]. INTERSPEECH 2020, 2020, : 2437 - 2441
  • [37] GUIDED VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT WITH A SUPERVISED CLASSIFIER
    Carbajal, Guillaume
    Richter, Julius
    Gerkmann, Timo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 681 - 685
  • [38] The Multilayer Perceptron Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
    Srikotr, Tanasan
    Mano, Kazunori
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2020, : 348 - 353
  • [39] VECTOR-QUANTIZED TRANSFORM CODER FOR SPEECH CODING AT 9.6KBIT/S AND BELOW
    KONDOZ, A
    EVANS, BG
    [J]. ELECTRONICS LETTERS, 1987, 23 (24) : 1286 - 1288
  • [40] Phase-aware subspace decomposition for single channel speech separation
    Wiem, Belhedi
    Mohamed Anouar, Ben Messaoud
    Aicha, Bouzid
    [J]. IET SIGNAL PROCESSING, 2020, 14 (04) : 214 - 222