Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition

被引:1
|
作者
Agrawal, Vikas [1 ]
Kumar, Shashi [1 ]
Rath, Shakti P. [2 ]
机构
[1] Samsung R&D Inst India, Bangalore, Karnataka, India
[2] Reverie Language Technol, Bangalore, Karnataka, India
来源
关键词
whisper speech recognition; autoencoder; wTIMIT; Variational autoencoder; jointVAE;
D O I
10.21437/Interspeech.2021-953
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Whispering is the natural choice of communication when one wants to interact quietly and privately. Due to vast differences in acoustic characteristics of whisper and natural speech, there is drastic degradation in the performance of whisper speech when decoded by the Automatic Speech Recognition (ASR) system trained on neutral speech. Recently, to handle this mismatched train and test scenario Denoising Autoencoders (DA) are used which gives some improvement. To improve over DA performance we propose another method to map speech from whisper domain to neutral speech domain via Joint Variational Auto-Encoder (JVAE). The proposed method requires time-aligned parallel data which is not available, so we developed an algorithm to convert parallel data to time-aligned parallel data. JVAE jointly learns the characteristics of whisper and neutral speech in a common latent space which significantly improves whisper recognition accuracy and outperforms traditional autoencoder based techniques. We benchmarked our method against two baselines, first being ASR trained on neutral speech and tested on whisper dataset and second being whisper test set mapped using DA and tested on same neutral ASR. We achieved an absolute improvement of 22.31% in Word Error Rate (WER) over the first baseline and an absolute 5.52% improvement over DA.
引用
收藏
页码:2706 / 2710
页数:5
相关论文
共 50 条
  • [1] Speech Enhancement Using Dynamical Variational AutoEncoder
    Do, Hao D.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 247 - 258
  • [2] Far-Field Speech Enhancement using Heteroscedastic Autoencoder for Improved Speech Recognition
    Kumar, Shashi
    Rath, Shakti P.
    [J]. INTERSPEECH 2019, 2019, : 446 - 450
  • [3] A RECURRENT VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT
    Leglaive, Simon
    Alameda-Pineda, Xavier
    Girin, Laurent
    Horaud, Radu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 371 - 375
  • [4] A Disentangled Recurrent Variational Autoencoder for Speech Enhancement
    Yan, Hegen
    Lu, Zhihua
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1697 - 1702
  • [5] Bimodal variational autoencoder for audiovisual speech recognition
    Hadeer M. Sayed
    Hesham E. ElDeeb
    Shereen A. Taie
    [J]. Machine Learning, 2023, 112 : 1201 - 1226
  • [6] Bimodal variational autoencoder for audiovisual speech recognition
    Sayed, Hadeer M.
    ElDeeb, Hesham E.
    Taie, Shereen A.
    [J]. MACHINE LEARNING, 2023, 112 (04) : 1201 - 1226
  • [7] Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder
    Bando, Yoshiaki
    Sekiguchi, Kouhei
    Yoshii, Kazuyoshi
    [J]. INTERSPEECH 2020, 2020, : 2437 - 2441
  • [8] GUIDED VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT WITH A SUPERVISED CLASSIFIER
    Carbajal, Guillaume
    Richter, Julius
    Gerkmann, Timo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 681 - 685
  • [9] Deep Autoencoder based Speech Features for Improved Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Das, Biswajit
    Kopparapu, Sunil Kumar
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1854 - 1858
  • [10] VARIATIONAL AUTOENCODER FOR SPEECH ENHANCEMENT WITH A NOISE-AWARE ENCODER
    Fang, Huajian
    Carbajal, Guillaume
    Wermter, Stefan
    Gerkmann, Timo
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 676 - 680