Unsupervised Speech Enhancement Using Dynamical Variational Autoencoders

被引:17
|
作者
Bie, Xiaoyu [1 ]
Leglaive, Simon [2 ]
Alameda-Pineda, Xavier [1 ]
Girin, Laurent [3 ]
机构
[1] Univ Grenoble Alpes, Inria Grenoble Rhone Alpes, F-38000 Grenoble, France
[2] Cent Supelec, IETR UMR CNRS 6164, F-35576 Cesson Sevigne, France
[3] Univ Grenoble Alpes, GIPSA Lab, CNRS, Grenoble INP, F-38402 Grenoble, France
基金
欧盟地平线“2020”;
关键词
Speech enhancement; Noise measurement; Training; Recording; Inference algorithms; Time-domain analysis; Time series analysis; dynamical variational autoencoders; nonnegative matrix factorization; variational inference; NONNEGATIVE MATRIX FACTORIZATION; SEMI-SUPERVISED SEPARATION; ALGORITHM; NOISE;
D O I
10.1109/TASLP.2022.3207349
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech spectrograms modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that requires neither noise samples nor noisy speech samples at training time, but only requires clean speech signals. In this paper, we extend these works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. The algorithm is presented with the most general DVAE formulation and is then applied with three specific DVAE models to illustrate the versatility of the framework. Experimental results show that the proposed DVAE-based approach outperforms its VAE-based counterpart, as well as several supervised and unsupervised noise-dependent baselines, especially when the noise type is unseen during training.
引用
收藏
页码:2993 / 3007
页数:15
相关论文
共 50 条
  • [1] ROBUST UNSUPERVISED AUDIO-VISUAL SPEECH ENHANCEMENT USING A MIXTURE OF VARIATIONAL AUTOENCODERS
    Sadeghi, Mostafa
    Alameda-Pineda, Xavier
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7534 - 7538
  • [2] Speech Enhancement Using Dynamical Variational AutoEncoder
    Do, Hao D.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 247 - 258
  • [3] SPEECH DEREVERBERATION USING VARIATIONAL AUTOENCODERS
    Baby, Deepak
    Bourlard, Herve
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5784 - 5788
  • [4] A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling
    Bie, Xiaoyu
    Girin, Laurent
    Leglaive, Simon
    Hueber, Thomas
    Alameda-Pineda, Xavier
    [J]. INTERSPEECH 2021, 2021, : 46 - 50
  • [5] A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
    Pariente, Manuel
    Deleforge, Antoine
    Vincent, Emmanuel
    [J]. INTERSPEECH 2019, 2019, : 3158 - 3162
  • [6] A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT
    Leglaive, Simon
    Girin, Laurent
    Horaud, Radu
    [J]. 2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [7] SPEECH ENHANCEMENT WITH VARIATIONAL AUTOENCODERS AND ALPHA-STABLE DISTRIBUTIONS
    Leglaive, Simon
    Simsekli, Umut
    Liutkus, Antoine
    Girin, Laurent
    Horaud, Radu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 541 - 545
  • [8] Modeling and Transforming Speech using Variational Autoencoders
    Blaauw, Merlijn
    Bonada, Jordi
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1770 - 1774
  • [9] Unsupervised Speech Representation Learning Using WaveNet Autoencoders
    Chorowski, Jan
    Weiss, Ron J.
    Bengio, Samy
    van den Oord, Aaron
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2041 - 2053
  • [10] Unsupervised aspect-based summarization using variational autoencoders
    Shan, Huawei
    Lu, Dongyuan
    Zhang, Li
    [J]. Expert Systems with Applications, 2025, 266