End-to-end recurrent denoising autoencoder embeddings for speaker identification

被引：0

作者：

Esther Rituerto-González

Carmen Peláez-Moreno

机构：

[1] University Carlos III of Madrid,Group of Multimedia Processing, Department of Signal Theory and Communications

来源：

Neural Computing and Applications | 2021年 / 33卷

关键词：

Denoising autoencoder; Speaker embeddings; Noisy conditions; Stress; End-to-end model; Speaker identification;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Speech ‘in-the-wild’ is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and the emotional state of the speaker. Taking advantage of the principles of representation learning, we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real-life environmental noise in a database containing real stressed speech. Our study presents that the joint optimization of both the denoiser and speaker identification modules outperforms independent optimization of both components under stress and noise distortions as well as handcrafted features.

引用

页码：14429 / 14439

页数：10

共 50 条

[1] End-to-end recurrent denoising autoencoder embeddings for speaker identification
Rituerto-Gonzalez, Esther
Pelaez-Moreno, Carmen
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21): : 14429 - 14439
[2] End-to-End Chinese Speaker Identification
Yu, Dian
Zhou, Ben
Yu, Dong
[J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2274 - 2285
[3] Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
Seo, Soonshin
Rim, Daniel Jun
Lim, Minkyu
Lee, Donghyun
Park, Hosung
Oh, Junseok
Kim, Changmin
Kim, Ji-Hwan
[J]. INTERSPEECH 2019, 2019, : 2928 - 2932
[4] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
Snyder, David
Ghahremani, Pegah
Povey, Daniel
Garcia-Romero, Daniel
Carmiel, Yishay
Khudanpur, Sanjeev
[J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
[5] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Denisov, Pavel
Ngoc Thang Vu
[J]. INTERSPEECH 2019, 2019, : 4425 - 4429
[6] Improved Relation Networks for End-to-End Speaker Verification and Identification
Chaubey, Ashutosh
Sinha, Sparsh
Ghose, Susmita
[J]. INTERSPEECH 2022, 2022, : 5085 - 5089
[7] FRAME-LEVEL SPEAKER EMBEDDINGS FOR TEXT-INDEPENDENT SPEAKER RECOGNITION AND ANALYSIS OF END-TO-END MODEL
Shon, Suwon
Tang, Hao
Glass, James
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1007 - 1013
[8] End-to-end Convolutional Semantic Embeddings
You, Quanzeng
Zhang, Zhengyou
Luo, Jiebo
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5735 - 5744
[9] Hybrid Network For End-To-End Text-Independent Speaker Identification
Ghezaiel, Wajdi
Brun, Luc
Lezoray, Olivier
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2352 - 2359
[10] End-to-End Active Speaker Detection
Alcazar, Juan Leon
Cordes, Moritz
Zhao, Chen
Ghanem, Bernard
[J]. COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 126 - 143

← 1 2 3 4 5 →