Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification

被引:5
|
作者
Malek, Jiri [1 ]
Jansky, Jakub [1 ]
Koldovsky, Zbynek [1 ]
Kounovsky, Tomas [1 ]
Cmejla, Jaroslav [1 ]
Zdansky, Jindrich [1 ]
机构
[1] Tech Univ Liberec, Fac Mech Informat & Interdisciplinary Studies, Liberec 46117, Czech Republic
关键词
Speech processing; Data mining; Training; Feature extraction; Time-frequency analysis; Task analysis; Microphones; Blind extraction; supervised speaker identification; target speech extraction; BLIND SOURCE SEPARATION; ROBUST;
D O I
10.1109/TASLP.2022.3190739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This manuscript proposes a novel robust procedure for the extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is performed via independent vector extraction (IVE). Since the blind IVE cannot distinguish the target source by itself, it is guided towards the SOI via frame-wise speaker identification based on deep learning. Still, an incorrect speaker can be extracted due to guidance failings, especially when processing challenging data. To identify such cases, we propose a criterion for non-intrusively assessing the estimated speaker. It utilizes the same model as the speaker identification, so no additional training is required. When incorrect extraction is detected, we propose a "deflation" step in which the incorrect source is subtracted from the mixture and, subsequently, another attempt to extract the SOI is performed. The process is repeated until successful extraction is achieved. The proposed procedure is experimentally tested on artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise, or microphone failures. The method is compared with state-of-the-art blind algorithms as well as with current fully supervised deep learning-based methods.
引用
收藏
页码:2295 / 2309
页数:15
相关论文
共 50 条
  • [1] BLIND EXTRACTION OF TARGET SPEECH SOURCE: THREEWAYS OF GUIDANCE EXPLOITING SUPERVISED SPEAKER EMBEDDINGS
    Malek, Jiri
    Cmejla, Jaroslav
    Koldovsky, Zbynek
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [2] RECURSIVE AND PARTIALLY SUPERVISED ALGORITHMS FOR SPEECH ENHANCEMENT ON THE BASIS OF INDEPENDENT VECTOR EXTRACTION
    Kounovsky, Tomas
    Koldovsky, Zbynek
    Cmejla, Jaroslav
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 401 - 405
  • [3] Speech extraction of a target speaker from one mixed speech signal
    Azetsu, Tadahiro
    Uchino, Eiji
    Suetake, Noriaki
    IEEJ Transactions on Electronics, Information and Systems, 2007, 127 (06) : 970 - 971
  • [4] Simplification of I-Vector Extraction for Speaker Identification
    XU Longting
    YANG Zhen
    SUN Linhui
    Chinese Journal of Electronics, 2016, 25 (06) : 1121 - 1126
  • [5] SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Ochiai, Tsubasa
    Nakatani, Tomohiro
    Burget, Lukas
    Cernocky, Jan
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 800 - 814
  • [6] Simplification of I-Vector Extraction for Speaker Identification
    Xu Longting
    Yang Zhen
    Sun Linhui
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (06) : 1121 - 1126
  • [7] TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition
    Li, Wenjie
    Zhang, Pengyuan
    Yan, Yonghong
    ELECTRONICS LETTERS, 2019, 55 (14) : 816 - 818
  • [8] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
    Han, Jiangyu
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098
  • [9] SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Zorila, Catalin
    Doddipatla, Rama
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6297 - 6301
  • [10] IMPROVING SPEAKER DISCRIMINATION OF TARGET SPEECH EXTRACTION WITH TIME-DOMAIN SPEAKERBEAM
    Delcroix, Marc
    Ochiai, Tsubasa
    Zmolikova, Katerina
    Kinoshita, Keisuke
    Tawara, Naohiro
    Nakatani, Tomohiro
    Araki, Shoko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 691 - 695