Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments

被引:1
|
作者
Xu, Jiaming [1 ]
Cui, Jian [2 ,3 ]
Hao, Yunzhe [2 ,3 ]
Xu, Bo [2 ,3 ,4 ]
机构
[1] Xiaomi Corp, Beijing 100085, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China
关键词
Cocktail party problem; target speaker separation; multi-cue guided separation; semi-supervised learning; SPEECH RECOGNITION; EXTRACTION;
D O I
10.1109/TASLP.2023.3323856
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.
引用
收藏
页码:151 / 163
页数:13
相关论文
共 50 条
  • [31] Distance metric learning guided adaptive subspace semi-supervised clustering
    Yin, Xuesong
    Hu, Enliang
    FRONTIERS OF COMPUTER SCIENCE IN CHINA, 2011, 5 (01): : 100 - 108
  • [32] Label Guided Graph Optimized Convolutional Network for Semi-Supervised Learning
    Zhang, Ziyan
    Jiang, Bo
    Tang, Jin
    Luo, Bin
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2025, 11 : 71 - 84
  • [33] ConMatch: Semi-supervised Learning with Confidence-Guided Consistency Regularization
    Kim, Jiwon
    Min, Youngjo
    Kim, Daehwan
    Lee, Gyuseong
    Seo, Junyoung
    Ryoo, Kwangrok
    Kim, Seungryong
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 674 - 690
  • [34] Distance metric learning guided adaptive subspace semi-supervised clustering
    Xuesong Yin
    Enliang Hu
    Frontiers of Computer Science in China, 2011, 5 : 100 - 108
  • [35] A survey of multi-label classification based on supervised and semi-supervised learning
    Han, Meng
    Wu, Hongxin
    Chen, Zhiqiang
    Li, Muhang
    Zhang, Xilong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 697 - 724
  • [36] A survey of multi-label classification based on supervised and semi-supervised learning
    Meng Han
    Hongxin Wu
    Zhiqiang Chen
    Muhang Li
    Xilong Zhang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 697 - 724
  • [37] Semi-supervised heterogeneous graph contrastive learning with label-guided
    Li, Chao
    Sun, Guoyi
    Li, Xin
    Shan, Juan
    APPLIED INTELLIGENCE, 2024, 54 (20) : 10055 - 10071
  • [38] Relation-Guided Versatile Regularization for Federated Semi-Supervised Learning
    Yang, Qiushi
    Chen, Zhen
    Peng, Zhe
    Yuan, Yixuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [39] Confidence-Guided Open-World Semi-supervised Learning
    Li, Jibang
    Yang, Meng
    Feng, Mao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IV, 2024, 14428 : 87 - 99
  • [40] Pruning-Guided Curriculum Learning for Semi-Supervised Semantic Segmentation
    Kong, Heejo
    Lee, Gun-Hee
    Kim, Suneung
    Lee, Seong-Whan
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5903 - 5912