Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments

被引:1
|
作者
Xu, Jiaming [1 ]
Cui, Jian [2 ,3 ]
Hao, Yunzhe [2 ,3 ]
Xu, Bo [2 ,3 ,4 ]
机构
[1] Xiaomi Corp, Beijing 100085, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China
关键词
Cocktail party problem; target speaker separation; multi-cue guided separation; semi-supervised learning; SPEECH RECOGNITION; EXTRACTION;
D O I
10.1109/TASLP.2023.3323856
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.
引用
收藏
页码:151 / 163
页数:13
相关论文
共 50 条
  • [41] Semi-supervised Multi-task Learning for Semantics and Depth
    Wang, Yufeng
    Tsai, Yi-Hsuan
    Hung, Wei-Chih
    Ding, Wenrui
    Liu, Shuo
    Yang, Ming-Hsuan
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2663 - 2672
  • [42] Semi-supervised Learning for Multi-component Data Classification
    Fujino, Akinori
    Ueda, Naonori
    Saito, Kazumi
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2754 - 2759
  • [43] Semi-supervised Multi-kernel Extreme Learning Machine
    Abuassba, Adnan O. M.
    Zhang Dezheng
    Mahmood, Zahid
    2017 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2018, 129 : 305 - 311
  • [44] Semi-supervised target classification in multi-frequency echosounder data
    Choi, Changkyu
    Kampffmeyer, Michael
    Handegard, Nils Olav
    Salberg, Arnt-Borre
    Brautaset, Olav
    Eikvil, Line
    Jenssen, Robert
    ICES JOURNAL OF MARINE SCIENCE, 2021, 78 (07) : 2615 - 2627
  • [45] Semi-supervised Multi-task Learning with Auxiliary data
    Liu, Bo
    Chen, Qihang
    Xiao, Yanshan
    Wang, Kai
    Liu, Junrui
    Huang, Ruiguang
    Li, Liangjiao
    INFORMATION SCIENCES, 2023, 626 : 626 - 639
  • [46] Multi-classes Semi-supervised Learning on Riemannian Manifolds
    Zhao, Zhong-Qiu
    Glotin, Herve
    Gao, Jun
    Wu, Xin-Dong
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL I, 2009, : 527 - +
  • [47] Semi-Supervised Multi-Task Learning with Task Regularizations
    Wang, Fei
    Wang, Xin
    Li, Tao
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 562 - 568
  • [48] A Multi-view Regularization Method for Semi-supervised Learning
    Wang, Jiao
    Luo, Siwei
    Li, Yan
    ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 1, PROCEEDINGS, 2010, 6063 : 444 - 449
  • [49] Online semi-supervised learning with multi-kernel ensemble
    National Key Laboratory for Novel Soft-ware Technology, Nanjing University, Nanjing 210093, China
    Jisuanji Yanjiu yu Fazhan, 2008, 12 (2060-2068):
  • [50] Multi-view Learning for Semi-supervised Sentiment Classification
    Su, Yan
    Li, Shoushan
    Ju, Shengfeng
    Zhou, Guodong
    Li, Xiaojun
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 13 - 16