Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification

被引:7
|
作者
Malek, Jiri [1 ]
Jansky, Jakub [1 ]
Koldovsky, Zbynek [1 ]
Kounovsky, Tomas [1 ]
Cmejla, Jaroslav [1 ]
Zdansky, Jindrich [1 ]
机构
[1] Tech Univ Liberec, Fac Mech Informat & Interdisciplinary Studies, Liberec 46117, Czech Republic
关键词
Speech processing; Data mining; Training; Feature extraction; Time-frequency analysis; Task analysis; Microphones; Blind extraction; supervised speaker identification; target speech extraction; BLIND SOURCE SEPARATION; ROBUST;
D O I
10.1109/TASLP.2022.3190739
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This manuscript proposes a novel robust procedure for the extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is performed via independent vector extraction (IVE). Since the blind IVE cannot distinguish the target source by itself, it is guided towards the SOI via frame-wise speaker identification based on deep learning. Still, an incorrect speaker can be extracted due to guidance failings, especially when processing challenging data. To identify such cases, we propose a criterion for non-intrusively assessing the estimated speaker. It utilizes the same model as the speaker identification, so no additional training is required. When incorrect extraction is detected, we propose a "deflation" step in which the incorrect source is subtracted from the mixture and, subsequently, another attempt to extract the SOI is performed. The process is repeated until successful extraction is achieved. The proposed procedure is experimentally tested on artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise, or microphone failures. The method is compared with state-of-the-art blind algorithms as well as with current fully supervised deep learning-based methods.
引用
收藏
页码:2295 / 2309
页数:15
相关论文
共 50 条
  • [21] A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image
    Saeed, Khalid
    Nammous, Mohammad Kheir
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2007, 54 (02) : 887 - 897
  • [22] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
    Sato, Hiroshi
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    Araki, Shoko
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
  • [23] Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
    Zhao, Zifeng
    Gu, Rongzhi
    Yang, Dongchao
    Tian, Jinchuan
    Zou, Yuexian
    INTERSPEECH 2022, 2022, : 5318 - 5322
  • [24] PROBING SELF-SUPERVISED LEARNING MODELS WITH TARGET SPEECH EXTRACTION<bold> </bold>
    Peng, Junyi
    Delcroix, Marc
    Ochiai, Tsubasa
    Plchot, Oldrich
    Ashihara, Takanori
    Araki, Shoko
    Cernocky, Jan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 535 - 539
  • [25] Invariant-integration method for robust feature extraction in speaker-independent speech recognition
    Mueller, Florian
    Mertins, Alfred
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2939 - 2942
  • [26] Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics
    Zmolikova, Katerina
    Delcroix, Marc
    Raj, Desh
    Watanabe, Shinji
    Cernocky, Jan Honza
    INTERSPEECH 2021, 2021, : 1464 - 1468
  • [27] Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues
    Ochiai, Tsubasa
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Nakatani, Tomohiro
    INTERSPEECH 2019, 2019, : 2718 - 2722
  • [28] Speaker Extraction With Co-Speech Gestures Cue
    Pan, Zexu
    Qian, Xinyuan
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1467 - 1471
  • [29] Speaker Localization and Speech Extraction with the EAR sensor.
    Bonnal, Julien
    Argentieri, Sylvain
    Danes, Patrick
    Manhes, Jerome
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 670 - 675
  • [30] SUBSPACE CONSTRAINED INDEPENDENT VECTOR EXTRACTION
    Liu, Tongzheng
    Lu, Zhihua
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,