Robust Speaker Extraction Network based on Iterative Refined Adaptation

被引:5
|
作者
Deng, Chengyun [1 ]
Ma, Shiqian [1 ]
Sha, Yongtao [1 ]
Zhang, Yi [1 ]
Zhang, Hui [2 ]
Song, Hui [1 ]
Wang, Fei [1 ]
机构
[1] Didi Chuxing, Beijing, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
来源
关键词
speaker extraction; iterative refined adaptation; speaker embedding; robustness;
D O I
10.21437/Interspeech.2021-2250
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given a reference speech from target speaker. Most speaker extraction systems achieve satisfactory performance in the closed condition. Such systems suffer from performance degradation given unseen target speakers and/or mismatched reference speech. In this paper we propose a novel strategy named Iterative Refined Adaptation (IRA) to improve the robustness and generalization capability of speaker extraction systems in the aforementioned scenarios. Given an initial speaker embedding encoded by an auxiliary network, the extraction network can obtain a latent representation of the target speaker as the feedback of the auxiliary network to refine the speaker embedding, which provides more accurate guidance for the extraction network. Experiments show that the network with IRA confirm the superior performance over comparison approaches in terms of SI-SDRi and PESQ on WSJ0-2mix-extr and WHAM! dataset.
引用
收藏
页码:3530 / 3534
页数:5
相关论文
共 50 条
  • [1] Iterative PLDA Adaptation for Speaker Diarization
    Le Lan, Gael
    Charlet, Delphine
    Larcher, Anthony
    Meignier, Sylvain
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2175 - 2179
  • [2] Speaker adaptation based on judge network with small adaptation words
    Jeong, JH
    Lee, SY
    [J]. IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL VI, 2000, : 87 - 90
  • [3] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
  • [4] EMAP-based speaker adaptation with robust correlation estimation
    Jon, E
    Kim, DK
    Kim, NS
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 321 - 324
  • [5] Robust correlation estimation for EMAP-Based speaker adaptation
    Jon, E
    Kim, DK
    Kim, NS
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (06) : 184 - 186
  • [6] Comparison of Speaker Adaptation Methods as Feature Extraction for SVM-Based Speaker Recognition
    Ferras, Marc
    Leung, Cheung-Chi
    Barras, Claude
    Gauvain, Jean-Luc
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1366 - 1378
  • [7] UTTERANCE-WISE RECURRENT DROPOUT AND ITERATIVE SPEAKER ADAPTATION FOR ROBUST MONAURAL SPEECH RECOGNITION
    Wang, Peidong
    Wang, DeLiang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4814 - 4818
  • [8] Iterative unsupervised speaker adaptation for batch dictation
    Homma, S
    Takahashi, J
    Sagayama, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1141 - 1144
  • [9] An approach to robust unsupervised speaker adaptation
    Kim, NS
    Seo, DJ
    Lim, W
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (06) : 469 - 472
  • [10] Intermediate-layer DNN Adaptation for Offline and Session-based Iterative Speaker Adaptation
    Kumar, Kshitiz
    Liu, Chaojun
    Yao, Kaisheng
    Gong, Yifan
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1091 - 1095