Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification

被引：7

作者：

Malek, Jiri ^{[1
]}

Jansky, Jakub ^{[1
]}

Koldovsky, Zbynek ^{[1
]}

Kounovsky, Tomas ^{[1
]}

Cmejla, Jaroslav ^{[1
]}

Zdansky, Jindrich ^{[1
]}

机构：

[1] Tech Univ Liberec, Fac Mech Informat & Interdisciplinary Studies, Liberec 46117, Czech Republic

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

关键词：

Speech processing; Data mining; Training; Feature extraction; Time-frequency analysis; Task analysis; Microphones; Blind extraction; supervised speaker identification; target speech extraction; BLIND SOURCE SEPARATION; ROBUST;

D O I：

10.1109/TASLP.2022.3190739

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This manuscript proposes a novel robust procedure for the extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is performed via independent vector extraction (IVE). Since the blind IVE cannot distinguish the target source by itself, it is guided towards the SOI via frame-wise speaker identification based on deep learning. Still, an incorrect speaker can be extracted due to guidance failings, especially when processing challenging data. To identify such cases, we propose a criterion for non-intrusively assessing the estimated speaker. It utilizes the same model as the speaker identification, so no additional training is required. When incorrect extraction is detected, we propose a "deflation" step in which the incorrect source is subtracted from the mixture and, subsequently, another attempt to extract the SOI is performed. The process is repeated until successful extraction is achieved. The proposed procedure is experimentally tested on artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise, or microphone failures. The method is compared with state-of-the-art blind algorithms as well as with current fully supervised deep learning-based methods.

引用

页码：2295 / 2309

页数：15

共 50 条

[21] A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image
Saeed, Khalid
Nammous, Mohammad Kheir
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2007, 54 (02) : 887 - 897
[22] MULTIMODAL ATTENTION FUSION FOR TARGET SPEAKER EXTRACTION
Sato, Hiroshi
Ochiai, Tsubasa
Kinoshita, Keisuke
Delcroix, Marc
Nakatani, Tomohiro
Araki, Shoko
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 778 - 784
[23] Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
Zhao, Zifeng
Gu, Rongzhi
Yang, Dongchao
Tian, Jinchuan
Zou, Yuexian
INTERSPEECH 2022, 2022, : 5318 - 5322
[24] PROBING SELF-SUPERVISED LEARNING MODELS WITH TARGET SPEECH EXTRACTION<bold> </bold>
Peng, Junyi
Delcroix, Marc
Ochiai, Tsubasa
Plchot, Oldrich
Ashihara, Takanori
Araki, Shoko
Cernocky, Jan
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 535 - 539
[25] Invariant-integration method for robust feature extraction in speaker-independent speech recognition
Mueller, Florian
Mertins, Alfred
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2939 - 2942
[26] Auxiliary loss function for target speech extraction and recognition with weak supervision based on speaker characteristics
Zmolikova, Katerina
Delcroix, Marc
Raj, Desh
Watanabe, Shinji
Cernocky, Jan Honza
INTERSPEECH 2021, 2021, : 1464 - 1468
[27] Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues
Ochiai, Tsubasa
Delcroix, Marc
Kinoshita, Keisuke
Ogawa, Atsunori
Nakatani, Tomohiro
INTERSPEECH 2019, 2019, : 2718 - 2722
[28] Speaker Extraction With Co-Speech Gestures Cue
Pan, Zexu
Qian, Xinyuan
Li, Haizhou
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1467 - 1471
[29] Speaker Localization and Speech Extraction with the EAR sensor.
Bonnal, Julien
Argentieri, Sylvain
Danes, Patrick
Manhes, Jerome
2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 670 - 675
[30] SUBSPACE CONSTRAINED INDEPENDENT VECTOR EXTRACTION
Liu, Tongzheng
Lu, Zhihua
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,

← 1 2 3 4 5 →