Robust Speaker Extraction Network based on Iterative Refined Adaptation

被引：5

作者：

Deng, Chengyun ^{[1
]}

Ma, Shiqian ^{[1
]}

Sha, Yongtao ^{[1
]}

Zhang, Yi ^{[1
]}

Zhang, Hui ^{[2
]}

Song, Hui ^{[1
]}

Wang, Fei ^{[1
]}

机构：

[1] Didi Chuxing, Beijing, Peoples R China

[2] Baidu Inc, Beijing, Peoples R China

来源：

INTERSPEECH 2021 | 2021年

关键词：

speaker extraction; iterative refined adaptation; speaker embedding; robustness;

D O I：

10.21437/Interspeech.2021-2250

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given a reference speech from target speaker. Most speaker extraction systems achieve satisfactory performance in the closed condition. Such systems suffer from performance degradation given unseen target speakers and/or mismatched reference speech. In this paper we propose a novel strategy named Iterative Refined Adaptation (IRA) to improve the robustness and generalization capability of speaker extraction systems in the aforementioned scenarios. Given an initial speaker embedding encoded by an auxiliary network, the extraction network can obtain a latent representation of the target speaker as the feedback of the auxiliary network to refine the speaker embedding, which provides more accurate guidance for the extraction network. Experiments show that the network with IRA confirm the superior performance over comparison approaches in terms of SI-SDRi and PESQ on WSJ0-2mix-extr and WHAM! dataset.

引用

页码：3530 / 3534

页数：5

共 50 条

[1] Iterative PLDA Adaptation for Speaker Diarization
Le Lan, Gael
Charlet, Delphine
Larcher, Anthony
Meignier, Sylvain
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2175 - 2179
[2] Speaker adaptation based on judge network with small adaptation words
Jeong, JH
Lee, SY
[J]. IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL VI, 2000, : 87 - 90
[3] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
Zmolikova, Katerina
Delcroix, Marc
Kinoshita, Keisuke
Higuchi, Takuya
Ogawa, Atsunori
Nakatani, Tomohiro
[J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
[4] EMAP-based speaker adaptation with robust correlation estimation
Jon, E
Kim, DK
Kim, NS
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 321 - 324
[5] Robust correlation estimation for EMAP-Based speaker adaptation
Jon, E
Kim, DK
Kim, NS
[J]. IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (06) : 184 - 186
[6] Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition
Ferras, Marc
Leung, Cheung-Chi
Barras, Claude
Gauvain, Jean-Luc
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1366 - 1378
[7] UTTERANCE-WISE RECURRENT DROPOUT AND ITERATIVE SPEAKER ADAPTATION FOR ROBUST MONAURAL SPEECH RECOGNITION
Wang, Peidong
Wang, DeLiang
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4814 - 4818
[8] Iterative unsupervised speaker adaptation for batch dictation
Homma, S
Takahashi, J
Sagayama, S
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1141 - 1144
[9] An approach to robust unsupervised speaker adaptation
Kim, NS
Seo, DJ
Lim, W
[J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (06) : 469 - 472
[10] Intermediate-layer DNN Adaptation for Offline and Session-based Iterative Speaker Adaptation
Kumar, Kshitiz
Liu, Chaojun
Yao, Kaisheng
Gong, Yifan
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1091 - 1095

← 1 2 3 4 5 →