LEARNING TO SEPARATE SOUNDS FROM WEAKLY LABELED SCENES

被引：0

作者：

Pishdadian, Fatemeh ^{[1
,2
]}

Wichern, Gordon ^{[1
]}

Le Roux, Jonathan ^{[1
]}

机构：

[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA

[2] Northwestern Univ, Interact Audio Lab, Evanston, IL 60208 USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

audio source separation; semi-supervised classification; weakly-labeled data; SPEECH SEPARATION; EVENT DETECTION;

D O I：

10.1109/icassp40776.2020.9053055

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep learning models for monaural audio source separation are typically trained on large collections of isolated sources, which may not be available in domains such as environmental monitoring. We propose objective functions and network architectures that enable training a source separation system with weak labels. In contrast with strong time-frequency (TF) labels, weak labels only indicate the time periods where different sources are active in this scenario. We train a separator that outputs a TF mask for each type of sound event, using a classifier to pool label estimates across frequency. Our objective function requires the classifier applied to a separated source to output weak labels for the class corresponding to that source and zeros for all other classes. The objective function also enforces that the separated sources sum to the mixture. We benchmark performance using synthetic mixtures of overlapping sound events recorded in urban environments. Compared to training on mixtures and their isolated sources, our model still achieves significant SDR improvement.

引用

页码：91 / 95

页数：5

共 50 条

[1] Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
Fayek, Haytham M.
Kumar, Anurag
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 558 - 565
[2] Learning Semantic Concepts from Weakly Labeled Data
Hicsonmez, Samet
Rezazadeh, Iman
Unal, Damla
Yaruktepe, Didem
Duygulu, Pinar
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
[3] Learning from weakly labeled faces and video in the wild
Rim, David
Hasan, Md Kamrul
Puech, Fannie
Pal, Christopher J.
PATTERN RECOGNITION, 2015, 48 (03) : 759 - 771
[4] Learning discriminative localization from weakly labeled data
Hoai, Minh
Torresani, Lorenzo
De la Torre, Fernando
Rother, Carsten
PATTERN RECOGNITION, 2014, 47 (03) : 1523 - 1534
[5] JOINT ANALYSIS OF ACOUSTIC SCENES AND SOUND EVENTS WITH WEAKLY LABELED DATA
Tsubaki, Shunsuke
Imoto, Keisuke
Ono, Nobutaka
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
[6] Learning to Separate Object Sounds by Watching Unlabeled Video
Gao, Ruohan
Feris, Rogerio
Grauman, Kristen
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 36 - 54
[7] KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES
Kumar, Anurag
Khadkevich, Maksim
Fugen, Christian
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 326 - 330
[8] ACTIVE PRIVILEGED LEARNING OF HUMAN ACTIVITIES FROM WEAKLY LABELED SAMPLES
Vrigkas, Michalis
Nikou, Christophoros
Kakadiaris, Ioannis A.
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3036 - 3040
[9] Image annotation by semantic neighborhood learning from weakly labeled dataset
Tian, Feng
Shen, Xukun
Tian, Feng, 1821, Science Press (51): : 1821 - 1832
[10] β-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
Zantedeschi, Valentina
Emonet, Remi
Sebban, Marc
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →