CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION

被引:1
|
作者
Guan, Yadong [1 ]
Han, Jiqing [1 ]
Song, Hongwei [1 ]
Song, Wenjie [1 ]
Zheng, Guibin [1 ]
Zheng, Tieran [1 ]
He, Yongjun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Polyphonic Sound Event Detection; Feature Disentanglement; Contrastive Loss;
D O I
10.1109/ICASSP48485.2024.10447743
中图分类号
学科分类号
摘要
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.
引用
收藏
页码:1021 / 1025
页数:5
相关论文
共 50 条
  • [31] Duration-Controlled LSTM for Polyphonic Sound Event Detection
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Hori, Takaaki
    Le Roux, Jonathan
    Takeda, Kazuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (11) : 2059 - 2070
  • [32] POLYPHONIC SOUND EVENT AND SOUND ACTIVITY DETECTION: A MULTI-TASK APPROACH
    Pankajakshan, Arjun
    Bear, Helen L.
    Benetos, Emmanouil
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 323 - 327
  • [33] Complex Activity Recognition Using Polyphonic Sound Event Detection
    Kang, Jaewoong
    Kim, Jooyeong
    Kim, Kunyoung
    Sohn, Mye
    INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING, IMIS-2018, 2019, 773 : 675 - 684
  • [34] A SEQUENCE MATCHING NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Thi Ngoc Tho Nguyen
    Jones, Douglas L.
    Gan, Woon-Seng
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 71 - 75
  • [35] Relational recurrent neural networks for polyphonic sound event detection
    Ma, Junbo
    Wang, Ruili
    Ji, Wanting
    Zheng, Hao
    Zhu, En
    Yin, Jianping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (20) : 29509 - 29527
  • [36] The filter diagonalisation method for music signal analysis: frame-wise vibrato detection and estimation
    Yang, Luwei
    Rajab, Khalid Z.
    Chew, Elaine
    JOURNAL OF MATHEMATICS AND MUSIC, 2017, 11 (01) : 42 - 60
  • [37] AN IMPROVED EVENT-INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Gao, Yin
    Iqbal, Turab
    Kong, Qiuqiang
    An, Fengyan
    Wang, Wenwu
    Plumbley, Mark D.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 885 - 889
  • [38] Polyphonic Sound Event Detection by Using Capsule Neural Networks
    Vesperini, Fabio
    Gabrielli, Leonardo
    Principi, Emanuele
    Squartini, Stefano
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 310 - 322
  • [39] A FRAME LOSS OF MULTIPLE INSTANCE LEARNING FOR WEAKLY SUPERVISED SOUND EVENT DETECTION
    Wang, Xu
    Zhang, Xiangjinzi
    Zi, Yunfei
    Xiong, Shengwu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 331 - 335
  • [40] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
    Cakir, Emre
    Parascandolo, Giambattista
    Heittola, Toni
    Huttunen, Heikki
    Virtanen, Tuomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303