LEARNING TO SEPARATE SOUNDS FROM WEAKLY LABELED SCENES

被引:0
|
作者
Pishdadian, Fatemeh [1 ,2 ]
Wichern, Gordon [1 ]
Le Roux, Jonathan [1 ]
机构
[1] Mitsubishi Elect Res Labs MERL, Cambridge, MA 02139 USA
[2] Northwestern Univ, Interact Audio Lab, Evanston, IL 60208 USA
关键词
audio source separation; semi-supervised classification; weakly-labeled data; SPEECH SEPARATION; EVENT DETECTION;
D O I
10.1109/icassp40776.2020.9053055
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning models for monaural audio source separation are typically trained on large collections of isolated sources, which may not be available in domains such as environmental monitoring. We propose objective functions and network architectures that enable training a source separation system with weak labels. In contrast with strong time-frequency (TF) labels, weak labels only indicate the time periods where different sources are active in this scenario. We train a separator that outputs a TF mask for each type of sound event, using a classifier to pool label estimates across frequency. Our objective function requires the classifier applied to a separated source to output weak labels for the class corresponding to that source and zeros for all other classes. The objective function also enforces that the separated sources sum to the mixture. We benchmark performance using synthetic mixtures of overlapping sound events recorded in urban environments. Compared to training on mixtures and their isolated sources, our model still achieves significant SDR improvement.
引用
收藏
页码:91 / 95
页数:5
相关论文
共 50 条
  • [1] Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
    Fayek, Haytham M.
    Kumar, Anurag
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 558 - 565
  • [2] Learning Semantic Concepts from Weakly Labeled Data
    Hicsonmez, Samet
    Rezazadeh, Iman
    Unal, Damla
    Yaruktepe, Didem
    Duygulu, Pinar
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [3] Learning from weakly labeled faces and video in the wild
    Rim, David
    Hasan, Md Kamrul
    Puech, Fannie
    Pal, Christopher J.
    PATTERN RECOGNITION, 2015, 48 (03) : 759 - 771
  • [4] Learning discriminative localization from weakly labeled data
    Hoai, Minh
    Torresani, Lorenzo
    De la Torre, Fernando
    Rother, Carsten
    PATTERN RECOGNITION, 2014, 47 (03) : 1523 - 1534
  • [5] JOINT ANALYSIS OF ACOUSTIC SCENES AND SOUND EVENTS WITH WEAKLY LABELED DATA
    Tsubaki, Shunsuke
    Imoto, Keisuke
    Ono, Nobutaka
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [6] Learning to Separate Object Sounds by Watching Unlabeled Video
    Gao, Ruohan
    Feris, Rogerio
    Grauman, Kristen
    COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 36 - 54
  • [7] KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES
    Kumar, Anurag
    Khadkevich, Maksim
    Fugen, Christian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 326 - 330
  • [8] ACTIVE PRIVILEGED LEARNING OF HUMAN ACTIVITIES FROM WEAKLY LABELED SAMPLES
    Vrigkas, Michalis
    Nikou, Christophoros
    Kakadiaris, Ioannis A.
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3036 - 3040
  • [9] Image annotation by semantic neighborhood learning from weakly labeled dataset
    Tian, Feng
    Shen, Xukun
    Tian, Feng, 1821, Science Press (51): : 1821 - 1832
  • [10] β-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
    Zantedeschi, Valentina
    Emonet, Remi
    Sebban, Marc
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29