End-to-end recurrent denoising autoencoder embeddings for speaker identification

被引:0
|
作者
Esther Rituerto-González
Carmen Peláez-Moreno
机构
[1] University Carlos III of Madrid,Group of Multimedia Processing, Department of Signal Theory and Communications
来源
关键词
Denoising autoencoder; Speaker embeddings; Noisy conditions; Stress; End-to-end model; Speaker identification;
D O I
暂无
中图分类号
学科分类号
摘要
Speech ‘in-the-wild’ is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and the emotional state of the speaker. Taking advantage of the principles of representation learning, we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real-life environmental noise in a database containing real stressed speech. Our study presents that the joint optimization of both the denoiser and speaker identification modules outperforms independent optimization of both components under stress and noise distortions as well as handcrafted features.
引用
收藏
页码:14429 / 14439
页数:10
相关论文
共 50 条
  • [1] End-to-end recurrent denoising autoencoder embeddings for speaker identification
    Rituerto-Gonzalez, Esther
    Pelaez-Moreno, Carmen
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21): : 14429 - 14439
  • [2] End-to-End Chinese Speaker Identification
    Yu, Dian
    Zhou, Ben
    Yu, Dong
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2274 - 2285
  • [3] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
    Seo, Soonshin
    Rim, Daniel Jun
    Lim, Minkyu
    Lee, Donghyun
    Park, Hosung
    Oh, Junseok
    Kim, Changmin
    Kim, Ji-Hwan
    [J]. INTERSPEECH 2019, 2019, : 2928 - 2932
  • [4] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [5] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
    Denisov, Pavel
    Ngoc Thang Vu
    [J]. INTERSPEECH 2019, 2019, : 4425 - 4429
  • [6] Improved Relation Networks for End-to-End Speaker Verification and Identification
    Chaubey, Ashutosh
    Sinha, Sparsh
    Ghose, Susmita
    [J]. INTERSPEECH 2022, 2022, : 5085 - 5089
  • [7] FRAME-LEVEL SPEAKER EMBEDDINGS FOR TEXT-INDEPENDENT SPEAKER RECOGNITION AND ANALYSIS OF END-TO-END MODEL
    Shon, Suwon
    Tang, Hao
    Glass, James
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1007 - 1013
  • [8] End-to-end Convolutional Semantic Embeddings
    You, Quanzeng
    Zhang, Zhengyou
    Luo, Jiebo
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5735 - 5744
  • [9] Hybrid Network For End-To-End Text-Independent Speaker Identification
    Ghezaiel, Wajdi
    Brun, Luc
    Lezoray, Olivier
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2352 - 2359
  • [10] End-to-End Active Speaker Detection
    Alcazar, Juan Leon
    Cordes, Moritz
    Zhao, Chen
    Ghanem, Bernard
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 126 - 143