Robust Speaker Extraction Network based on Iterative Refined Adaptation

被引:5
|
作者
Deng, Chengyun [1 ]
Ma, Shiqian [1 ]
Sha, Yongtao [1 ]
Zhang, Yi [1 ]
Zhang, Hui [2 ]
Song, Hui [1 ]
Wang, Fei [1 ]
机构
[1] Didi Chuxing, Beijing, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
来源
关键词
speaker extraction; iterative refined adaptation; speaker embedding; robustness;
D O I
10.21437/Interspeech.2021-2250
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speaker extraction aims to extract target speech signal from a multi-talker environment with interference speakers and surrounding noise, given a reference speech from target speaker. Most speaker extraction systems achieve satisfactory performance in the closed condition. Such systems suffer from performance degradation given unseen target speakers and/or mismatched reference speech. In this paper we propose a novel strategy named Iterative Refined Adaptation (IRA) to improve the robustness and generalization capability of speaker extraction systems in the aforementioned scenarios. Given an initial speaker embedding encoded by an auxiliary network, the extraction network can obtain a latent representation of the target speaker as the feedback of the auxiliary network to refine the speaker embedding, which provides more accurate guidance for the extraction network. Experiments show that the network with IRA confirm the superior performance over comparison approaches in terms of SI-SDRi and PESQ on WSJ0-2mix-extr and WHAM! dataset.
引用
收藏
页码:3530 / 3534
页数:5
相关论文
共 50 条
  • [31] ATTENTION-BASED NEURAL NETWORK FOR JOINT DIARIZATION AND SPEAKER EXTRACTION
    Chazan, Shlomo E.
    Gannot, Sharon
    Goldberger, Jacob
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 301 - 305
  • [32] Robust speaker adaptation by weighted model averaging based on the minimum description length criterion
    Cui, Xiaodong
    Alwan, Abeer
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 652 - 660
  • [33] SEQUENCE SUMMARIZING NEURAL NETWORK FOR SPEAKER ADAPTATION
    Vesely, Karel
    Watanabe, Shinji
    Zmolikova, Katerina
    Karafiat, Martin
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5315 - 5319
  • [34] Robust i-vector extraction for neural network adaptation in noisy environment
    Yu, Chengzhu
    Ogawa, Atsunori
    Delcroix, Marc
    Yoshioka, Takuya
    Nakatani, Tomohiro
    Hansen, John H. L.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2854 - 2857
  • [35] Acoustic feature extraction method for robust speaker identification
    Zuoqiang Li
    Yong Gao
    [J]. Multimedia Tools and Applications, 2016, 75 : 7391 - 7406
  • [36] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [37] Acoustic feature extraction method for robust speaker identification
    Li, Zuoqiang
    Gao, Yong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (12) : 7391 - 7406
  • [38] An Auditory Feature Extraction Method for Robust Speaker Recognition
    Hu, Fengsong
    Cao, Xiaoyu
    [J]. PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, 2012, : 1067 - 1071
  • [39] A robust extraction algorithm based on ICA neural network
    Ye, Yalan
    Zhang, Zhi-Lin
    Mo, Quanyi
    Zeng, Jiazhi
    [J]. 2007 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2: VOL 1: COMMUNICATION THEORY AND SYSTEMS; VOL 2: SIGNAL PROCESSING, COMPUTATIONAL INTELLIGENCE, CIRCUITS AND SYSTEMS, 2007, : 872 - +
  • [40] Kernel-based speaker clustering for rapid speaker adaptation
    Hazrati, Dooz
    Ahadi, S. M.
    Sadjadi, Omid
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 1287 - 1289