Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments

被引：1

作者：

Xu, Jiaming ^{[1
]}

Cui, Jian ^{[2
,3
]}

Hao, Yunzhe ^{[2
,3
]}

Xu, Bo ^{[2
,3
,4
]}

机构：

[1] Xiaomi Corp, Beijing 100085, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China

[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Cocktail party problem; target speaker separation; multi-cue guided separation; semi-supervised learning; SPEECH RECOGNITION; EXTRACTION;

D O I：

10.1109/TASLP.2023.3323856

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.

引用

页码：151 / 163

页数：13

共 50 条

[31] Distance metric learning guided adaptive subspace semi-supervised clustering
Yin, Xuesong
Hu, Enliang
FRONTIERS OF COMPUTER SCIENCE IN CHINA, 2011, 5 (01): : 100 - 108
[32] Label Guided Graph Optimized Convolutional Network for Semi-Supervised Learning
Zhang, Ziyan
Jiang, Bo
Tang, Jin
Luo, Bin
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2025, 11 : 71 - 84
[33] ConMatch: Semi-supervised Learning with Confidence-Guided Consistency Regularization
Kim, Jiwon
Min, Youngjo
Kim, Daehwan
Lee, Gyuseong
Seo, Junyoung
Ryoo, Kwangrok
Kim, Seungryong
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 674 - 690
[34] Distance metric learning guided adaptive subspace semi-supervised clustering
Xuesong Yin
Enliang Hu
Frontiers of Computer Science in China, 2011, 5 : 100 - 108
[35] A survey of multi-label classification based on supervised and semi-supervised learning
Han, Meng
Wu, Hongxin
Chen, Zhiqiang
Li, Muhang
Zhang, Xilong
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 697 - 724
[36] A survey of multi-label classification based on supervised and semi-supervised learning
Meng Han
Hongxin Wu
Zhiqiang Chen
Muhang Li
Xilong Zhang
International Journal of Machine Learning and Cybernetics, 2023, 14 : 697 - 724
[37] Semi-supervised heterogeneous graph contrastive learning with label-guided
Li, Chao
Sun, Guoyi
Li, Xin
Shan, Juan
APPLIED INTELLIGENCE, 2024, 54 (20) : 10055 - 10071
[38] Relation-Guided Versatile Regularization for Federated Semi-Supervised Learning
Yang, Qiushi
Chen, Zhen
Peng, Zhe
Yuan, Yixuan
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
[39] Confidence-Guided Open-World Semi-supervised Learning
Li, Jibang
Yang, Meng
Feng, Mao
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IV, 2024, 14428 : 87 - 99
[40] Pruning-Guided Curriculum Learning for Semi-Supervised Semantic Segmentation
Kong, Heejo
Lee, Gun-Hee
Kim, Suneung
Lee, Seong-Whan
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5903 - 5912

← 1 2 3 4 5 →