SPEECH DEREVERBERATION USING VARIATIONAL AUTOENCODERS

被引:2
|
作者
Baby, Deepak [1 ]
Bourlard, Herve [2 ]
机构
[1] Amazon Alexa, Seattle, WA 98121 USA
[2] Idiap Res Inst, Speech & Audio Proc Grp, Martigny, Switzerland
基金
瑞士国家科学基金会;
关键词
speech dereverberation; variational autoencoders; non-negative matrix factorization; NONNEGATIVE MATRIX FACTORIZATION; CONVOLUTIVE TRANSFER-FUNCTION; SPARSE REPRESENTATIONS; SEPARATION; NOISE;
D O I
10.1109/ICASSP39728.2021.9414736
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a statistical method for single-channel speech dereverberation using a variational autoencoder (VAE) for modelling the speech spectra. One popular approach for modelling speech spectra is to use non-negative matrix factorization (NMF) where learned clean speech spectral bases are used as a linear generative model for speech spectra. This work replaces this linear model with a powerful nonlinear deep generative model based on VAE. Further, this paper formulates a unified probabilistic generative model of reverberant speech based on Gaussian and Poisson distributions. We develop a Monte Carlo expectation-maximization algorithm for inferring the latent variables in the VAE and estimating the room impulse response for both probabilistic models. Evaluation results show the superiority of the proposed VAE-based models over the NMF-based counterparts.
引用
收藏
页码:5784 / 5788
页数:5
相关论文
共 50 条
  • [1] Modeling and Transforming Speech using Variational Autoencoders
    Blaauw, Merlijn
    Bonada, Jordi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1770 - 1774
  • [2] Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders
    Bie, Xiaoyu
    Leglaive, Simon
    Alameda-Pineda, Xavier
    Girin, Laurent
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2993 - 3007
  • [3] SPEECH PREDICTION IN SILENT VIDEOS USING VARIATIONAL AUTOENCODERS
    Yadav, Ravindra
    Sardana, Ashish
    Namboodiri, Vinay P.
    Hegde, Rajesh M.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7048 - 7052
  • [4] SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION
    Feng, Xue
    Zhang, Yaodong
    Glass, James
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] SPEECH DEREVERBERATION WITH CONVOLUTIVE TRANSFER FUNCTION APPROXIMATION USING MAP AND VARIATIONAL DECONVOLUTION APPROACHES
    Jukic, Ante
    van Waterschoot, Toon
    Gerkmann, Timo
    Doclo, Simon
    [J]. 2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 50 - 54
  • [6] SPEECH DEREVERBERATION USING A LEARNED SPEECH MODEL
    Liang, Dawen
    Hoffman, Matthew D.
    Mysore, Gautham J.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1871 - 1875
  • [7] A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
    Pariente, Manuel
    Deleforge, Antoine
    Vincent, Emmanuel
    [J]. INTERSPEECH 2019, 2019, : 3158 - 3162
  • [8] Variational Autoencoders to Learn Latent Representations of Speech Emotion
    Latif, Siddique
    Rana, Rajib
    Qadir, Junaid
    Epps, Julien
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3107 - 3111
  • [9] ROBUST UNSUPERVISED AUDIO-VISUAL SPEECH ENHANCEMENT USING A MIXTURE OF VARIATIONAL AUTOENCODERS
    Sadeghi, Mostafa
    Alameda-Pineda, Xavier
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7534 - 7538
  • [10] A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling
    Bie, Xiaoyu
    Girin, Laurent
    Leglaive, Simon
    Hueber, Thomas
    Alameda-Pineda, Xavier
    [J]. INTERSPEECH 2021, 2021, : 46 - 50