Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments

被引：1

作者：

Xu, Jiaming ^{[1
]}

Cui, Jian ^{[2
,3
]}

Hao, Yunzhe ^{[2
,3
]}

Xu, Bo ^{[2
,3
,4
]}

机构：

[1] Xiaomi Corp, Beijing 100085, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China

[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai 200031, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Cocktail party problem; target speaker separation; multi-cue guided separation; semi-supervised learning; SPEECH RECOGNITION; EXTRACTION;

D O I：

10.1109/TASLP.2023.3323856

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To solve the cocktail party problem in real multi-talker environments, this article proposed a multi-cue guided semi-supervised target speaker separation method (MuSS). Our MuSS integrates three target speaker-related cues, including spatial, visual, and voiceprint cues. Under the guidance of the cues, the target speaker is separated into a predefined output channel, and the interfering sources are separated into other output channels with the optimal permutation. Both synthetic mixtures and real mixtures are utilized for semi-supervised training. Specifically, for synthetic mixtures, the separated target source and other separated interfering sources are trained to reconstruct the ground-truth references, while for real mixtures, the mixture of two real mixtures is fed into our separation model, and the separated sources are remixed to reconstruct the two real mixtures. Besides, in order to facilitate finetuning and evaluating the estimated source on real mixtures, we introduce a real multi-modal speech separation dataset, RealMuSS, which is collected in real-world scenarios and is comprised of more than one hundred hours of multi-talker mixtures with high-quality pseudo references of the target speakers. Experimental results show that the pseudo references effectively improve the finetuning efficiency and enable the model to successfully learn and evaluate estimating speech on real mixtures, and various cue-driven separation models are greatly improved in signal-to-noise ratio and speech recognition accuracy under our semi-supervised learning framework.

引用

页码：151 / 163

页数：13

共 50 条

[41] Semi-supervised Multi-task Learning for Semantics and Depth
Wang, Yufeng
Tsai, Yi-Hsuan
Hung, Wei-Chih
Ding, Wenrui
Liu, Shuo
Yang, Ming-Hsuan
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2663 - 2672
[42] Semi-supervised Learning for Multi-component Data Classification
Fujino, Akinori
Ueda, Naonori
Saito, Kazumi
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2754 - 2759
[43] Semi-supervised Multi-kernel Extreme Learning Machine
Abuassba, Adnan O. M.
Zhang Dezheng
Mahmood, Zahid
2017 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2018, 129 : 305 - 311
[44] Semi-supervised target classification in multi-frequency echosounder data
Choi, Changkyu
Kampffmeyer, Michael
Handegard, Nils Olav
Salberg, Arnt-Borre
Brautaset, Olav
Eikvil, Line
Jenssen, Robert
ICES JOURNAL OF MARINE SCIENCE, 2021, 78 (07) : 2615 - 2627
[45] Semi-supervised Multi-task Learning with Auxiliary data
Liu, Bo
Chen, Qihang
Xiao, Yanshan
Wang, Kai
Liu, Junrui
Huang, Ruiguang
Li, Liangjiao
INFORMATION SCIENCES, 2023, 626 : 626 - 639
[46] Multi-classes Semi-supervised Learning on Riemannian Manifolds
Zhao, Zhong-Qiu
Glotin, Herve
Gao, Jun
Wu, Xin-Dong
PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL I, 2009, : 527 - +
[47] Semi-Supervised Multi-Task Learning with Task Regularizations
Wang, Fei
Wang, Xin
Li, Tao
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 562 - 568
[48] A Multi-view Regularization Method for Semi-supervised Learning
Wang, Jiao
Luo, Siwei
Li, Yan
ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 1, PROCEEDINGS, 2010, 6063 : 444 - 449
[49] Online semi-supervised learning with multi-kernel ensemble
National Key Laboratory for Novel Soft-ware Technology, Nanjing University, Nanjing 210093, China
Jisuanji Yanjiu yu Fazhan, 2008, 12 (2060-2068):
[50] Multi-view Learning for Semi-supervised Sentiment Classification
Su, Yan
Li, Shoushan
Ju, Shengfeng
Zhou, Guodong
Li, Xiaojun
2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 13 - 16

← 1 2 3 4 5 →