Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments

被引：1

作者：

Xu, Jiaming ^{[1
]}

Cui, Jian ^{[2
,3
]}

Hao, Yunzhe ^{[2
,3
]}

Xu, Bo ^{[2
,3
,4
]}

机构：

[1] Xiaomi Corp, Beijing 100085, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China

[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Cocktail party problem; target speaker separation; multi-cue guided separation; semi-supervised learning; SPEECH RECOGNITION; EXTRACTION;

D O I：

10.1109/TASLP.2023.3323856

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.

引用

页码：151 / 163

页数：13

共 50 条

[1] Multi-Cue Semi-Supervised Color Constancy With Limited Training Samples
Huang, Xinwei
Li, Bing
Li, Shuai
Li, Wenjuan
Xiong, Weihua
Yin, Xuanwu
Hu, Weiming
Qin, Hong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7875 - 7888
[2] Speaker Identification Using Semi-supervised Learning
Fazakis, Nikos
Karlos, Stamatis
Kotsiantis, Sotiris
Sgarbas, Kyriakos
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 389 - 396
[3] Semi-supervised Learning in Nonstationary Environments
Ditzler, Gregory
Polikar, Robi
2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2741 - 2748
[4] Robust semi-supervised learning in open environments
Guo, Lan-Zhe
Jia, Lin-Han
Shao, Jie-Jing
Li, Yu-Feng
FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (08)
[5] Multi-view classification with semi-supervised learning for SAR target recognition
Zhang, Yukun
Guo, Xiansheng
Ren, Haohao
Li, Lin
SIGNAL PROCESSING, 2021, 183
[6] CGT: Consistency Guided Training in Semi-Supervised Learning
Hasan, Nesreen
Ghorban, Farzin
Velten, Joerg
Kummert, Anton
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 55 - 64
[7] GRAPH CONVOLUTIONAL NETWORK BASED SEMI-SUPERVISED LEARNING ON MULTI-SPEAKER MEETING DATA
Tong, Fuchuan
Zheng, Siqi
Zhang, Min
Chen, Yafeng
Suo, Hongbin
Hong, Qingyang
Li, Lin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6622 - 6626
[8] Semi-supervised learning on multi-manifold
Chen, Mingxia
Wang, Jing
Journal of Computational Information Systems, 2014, 10 (12): : 5131 - 5138
[9] Semi-supervised trees for multi-target regression
Levatic, Jurica
Kocev, Dragi
Ceci, Michelangelo
Dzeroski, Saso
INFORMATION SCIENCES, 2018, 450 : 109 - 127
[10] The High Separation Probability Assumption for Semi-Supervised Learning
Huang, Gao
Du, Chaoqun
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (12): : 7561 - 7573

← 1 2 3 4 5 →