Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement

被引:2
|
作者
Tuan Vu Ho [1 ]
Quoc Huy Nguyen [1 ]
Akagi, Masato [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Japan
来源
关键词
Speech enhancement; vector-quantized variational autoencoder; complex Wiener filter; noise reduction; NOISE;
D O I
10.21437/Interspeech.2022-443
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-enhancement methods based on the complex ideal ratio mask (cIRM) have achieved promising results. These methods often deploy a deep neural network to jointly estimate the real and imaginary components of the cIRM defined in the complex domain. However, the unbounded property of the cIRM poses difficulties when it comes to effectively training a neural network. To alleviate this problem, this paper proposes a phase-aware speech-enhancement method through estimating the magnitude and phase of a complex adaptive Wiener filter. With this method, a noise-robust vector-quantized variational autoencoder is used for estimating the magnitude of the Wiener filter by using the Itakura-Saito divergence on the time-frequency domain, while the phase of the Wiener filter is estimated using a convolutional recurrent network using the scale-invariant signal-to-noise-ratio constraint in the time domain. The proposed method was evaluated on the open Voice Bank+DEMAND dataset to provide a direct comparison with other speech-enhancement methods and achieved a Perceptual Evaluation of Speech Quality score of 2.85 and ShortTime Objective Intelligibility score of 0.94, which is better than the state-of-art method based on cIRM estimation during the 2020 Deep Noise Challenge.
引用
收藏
页码:176 / 180
页数:5
相关论文
共 50 条
  • [21] FPGA Implementation of a Phase-Aware Single-Channel Speech Enhancement System
    Samui, Suman
    Sahu, Pragya
    Chakrabarti, Indrajit
    Ghosh, Soumya K.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2017, 36 (11) : 4688 - 4715
  • [22] An evaluation of the perceptual quality of phase-aware single-channel speech enhancement
    Krawczyk-Becker, Martin
    Gerkmann, Timo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (04): : EL364 - EL369
  • [23] FPGA Implementation of a Phase-Aware Single-Channel Speech Enhancement System
    Suman Samui
    Pragya Sahu
    Indrajit Chakrabarti
    Soumya K. Ghosh
    Circuits, Systems, and Signal Processing, 2017, 36 : 4688 - 4715
  • [24] Phase-aware deep speech enhancement: It's all about the frame length
    Peer, Tal
    Gerkmann, Timo
    JASA EXPRESS LETTERS, 2022, 2 (10):
  • [25] DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
    Hu, Yanxin
    Liu, Yun
    Lv, Shubo
    Xing, Mengtao
    Zhang, Shimin
    Fu, Yihui
    Wu, Jian
    Zhang, Bihong
    Xie, Lei
    INTERSPEECH 2020, 2020, : 2472 - 2476
  • [26] A VECTOR QUANTIZED MASKED AUTOENCODER FOR SPEECH EMOTION RECOGNITION
    Sadok, Samir
    Leglaive, Simon
    Seguier, Renaud
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [27] Phase-Aware Transformations in Variational Autoencoders for Audio Effects
    Cámara, Mateo
    Blanco, José Luis
    AES: Journal of the Audio Engineering Society, 2022, 70 (09): : 731 - 741
  • [28] ENHANCING INTO THE CODEC: NOISE ROBUST SPEECH CODING WITH VECTOR-QUANTIZED AUTOENCODERS
    Casebeer, Jonah
    Vale, Vinjai
    Isik, Umut
    Valin, Jean-Marc
    Giri, Ritwik
    Krishnaswamy, Arvindh
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 711 - 715
  • [29] Iterative Closed-Loop Phase-Aware Single-Channel Speech Enhancement
    Mowlaee, Pejman
    Saeidi, Rahim
    IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (12) : 1235 - 1239
  • [30] Speech Enhancement Using Dynamical Variational AutoEncoder
    Do, Hao D.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 247 - 258