Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments

被引:1
|
作者
Xu, Jiaming [1 ]
Cui, Jian [2 ,3 ]
Hao, Yunzhe [2 ,3 ]
Xu, Bo [2 ,3 ,4 ]
机构
[1] Xiaomi Corp, Beijing 100085, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China
关键词
Cocktail party problem; target speaker separation; multi-cue guided separation; semi-supervised learning; SPEECH RECOGNITION; EXTRACTION;
D O I
10.1109/TASLP.2023.3323856
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.
引用
收藏
页码:151 / 163
页数:13
相关论文
共 50 条
  • [1] Multi-Cue Semi-Supervised Color Constancy With Limited Training Samples
    Huang, Xinwei
    Li, Bing
    Li, Shuai
    Li, Wenjuan
    Xiong, Weihua
    Yin, Xuanwu
    Hu, Weiming
    Qin, Hong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7875 - 7888
  • [2] Speaker Identification Using Semi-supervised Learning
    Fazakis, Nikos
    Karlos, Stamatis
    Kotsiantis, Sotiris
    Sgarbas, Kyriakos
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 389 - 396
  • [3] Semi-supervised Learning in Nonstationary Environments
    Ditzler, Gregory
    Polikar, Robi
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2741 - 2748
  • [4] Robust semi-supervised learning in open environments
    Guo, Lan-Zhe
    Jia, Lin-Han
    Shao, Jie-Jing
    Li, Yu-Feng
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (08)
  • [5] Multi-view classification with semi-supervised learning for SAR target recognition
    Zhang, Yukun
    Guo, Xiansheng
    Ren, Haohao
    Li, Lin
    SIGNAL PROCESSING, 2021, 183
  • [6] CGT: Consistency Guided Training in Semi-Supervised Learning
    Hasan, Nesreen
    Ghorban, Farzin
    Velten, Joerg
    Kummert, Anton
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 55 - 64
  • [7] GRAPH CONVOLUTIONAL NETWORK BASED SEMI-SUPERVISED LEARNING ON MULTI-SPEAKER MEETING DATA
    Tong, Fuchuan
    Zheng, Siqi
    Zhang, Min
    Chen, Yafeng
    Suo, Hongbin
    Hong, Qingyang
    Li, Lin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6622 - 6626
  • [8] Semi-supervised learning on multi-manifold
    Chen, Mingxia
    Wang, Jing
    Journal of Computational Information Systems, 2014, 10 (12): : 5131 - 5138
  • [9] Semi-supervised trees for multi-target regression
    Levatic, Jurica
    Kocev, Dragi
    Ceci, Michelangelo
    Dzeroski, Saso
    INFORMATION SCIENCES, 2018, 450 : 109 - 127
  • [10] The High Separation Probability Assumption for Semi-Supervised Learning
    Huang, Gao
    Du, Chaoqun
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (12): : 7561 - 7573