End-to-end recurrent denoising autoencoder embeddings for speaker identification

被引:0
|
作者
Esther Rituerto-González
Carmen Peláez-Moreno
机构
[1] University Carlos III of Madrid,Group of Multimedia Processing, Department of Signal Theory and Communications
来源
关键词
Denoising autoencoder; Speaker embeddings; Noisy conditions; Stress; End-to-end model; Speaker identification;
D O I
暂无
中图分类号
学科分类号
摘要
Speech ‘in-the-wild’ is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and the emotional state of the speaker. Taking advantage of the principles of representation learning, we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real-life environmental noise in a database containing real stressed speech. Our study presents that the joint optimization of both the denoiser and speaker identification modules outperforms independent optimization of both components under stress and noise distortions as well as handcrafted features.
引用
收藏
页码:14429 / 14439
页数:10
相关论文
共 50 条
  • [21] End-to-End Convolutional Autoencoder for Nonlinear Hyperspectral Unmixing
    Dhaini, Mohamad
    Berar, Maxime
    Honeine, Paul
    Van Exem, Antonin
    [J]. REMOTE SENSING, 2022, 14 (14)
  • [22] End-to-end Autoencoder for Superchannel Transceivers with Hardware Impairment
    Song, Jinxiang
    Hager, Christian
    Schroder, Jochen
    Amat, Alexandre Graell, I
    Wymeersch, Henk
    [J]. 2021 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXPOSITION (OFC), 2021,
  • [23] Innovative Variational AutoEncoder for an End-to-End Communication System
    Alawad, Mohamad A.
    Hamdan, Mutasem Q.
    Hamdi, Khairi A.
    [J]. IEEE ACCESS, 2023, 11 : 86834 - 86847
  • [24] An end-to-end denoising autoencoder-based deep neural network approach for fault diagnosis of analog circuit
    Yueyi Yang
    Lide Wang
    Huang Chen
    Chong Wang
    [J]. Analog Integrated Circuits and Signal Processing, 2021, 107 : 605 - 616
  • [25] An end-to-end denoising autoencoder-based deep neural network approach for fault diagnosis of analog circuit
    Yang, Yueyi
    Wang, Lide
    Chen, Huang
    Wang, Chong
    [J]. ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2021, 107 (03) : 605 - 616
  • [26] End-to-end speaker identification research based on multi-scale SincNet and CGAN
    Wei, Guangcun
    Zhang, Yanna
    Min, Hang
    Xu, Yunfei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (30): : 22209 - 22222
  • [27] End-to-end speaker identification research based on multi-scale SincNet and CGAN
    Guangcun Wei
    Yanna Zhang
    Hang Min
    Yunfei Xu
    [J]. Neural Computing and Applications, 2023, 35 : 22209 - 22222
  • [28] Acoustic Word Embeddings for End-to-End Speech Synthesis
    Shen, Feiyu
    Du, Chenpeng
    Yu, Kai
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [29] END-TO-END SPEAKER DIARIZATION AS POST-PROCESSING
    Horiguchi, Shota
    Garcia, Paola
    Fujita, Yusuke
    Watanabe, Shinji
    Nagamatsu, Kenji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7188 - 7192
  • [30] Effective Phase Encoding for End-to-end Speaker Verification
    Peng, Junyi
    Qu, Xiaoyang
    Gu, Rongzhi
    Wang, Jianzong
    Xiao, Jing
    Burget, Lukas
    Cernocky, Jan ''Honza''
    [J]. INTERSPEECH 2021, 2021, : 2366 - 2370