CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION

被引:1
|
作者
Guan, Yadong [1 ]
Han, Jiqing [1 ]
Song, Hongwei [1 ]
Song, Wenjie [1 ]
Zheng, Guibin [1 ]
Zheng, Tieran [1 ]
He, Yongjun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Polyphonic Sound Event Detection; Feature Disentanglement; Contrastive Loss;
D O I
10.1109/ICASSP48485.2024.10447743
中图分类号
学科分类号
摘要
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.
引用
收藏
页码:1021 / 1025
页数:5
相关论文
共 50 条
  • [41] Relational recurrent neural networks for polyphonic sound event detection
    Junbo Ma
    Ruili Wang
    Wanting Ji
    Hao Zheng
    En Zhu
    Jianping Yin
    Multimedia Tools and Applications, 2019, 78 : 29509 - 29527
  • [42] Frame-Wise CNN-Based Filtering for Intra-Frame Quality Enhancement of HEVC Videos
    Huang, Hongyue
    Schiopu, Ionut
    Munteanu, Adrian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2100 - 2113
  • [43] A Survey of Polyphonic Sound Event Detection Based on Non-negative Matrix Factorization
    Manh-Quan Bui
    Viet-Hang Duong
    Mathulaprangsan, Seksan
    Bach-Tung Pham
    Lee, Wei-Jing
    Wang, Jia-Ching
    2016 INTERNATIONAL COMPUTER SYMPOSIUM (ICS), 2016, : 351 - 354
  • [44] Frame-wise detection of surgeon stress levels during laparoscopic training using kinematic data
    Yi Zheng
    Grey Leonard
    Herbert Zeh
    Ann Majewicz Fey
    International Journal of Computer Assisted Radiology and Surgery, 2022, 17 : 785 - 794
  • [45] Dataset for polyphonic sound event detection tasks in urban soundscapes: The synthetic polyphonic ambient sound source (SPASS) dataset
    Viveros-Munoz, Rhoddy
    Huijse, Pablo
    Vargas, Victor
    Espejo, Diego
    Poblete, Victor
    Arenas, Jorge P.
    Vernier, Matthieu
    Vergara, Diego
    Suarez, Enrique
    DATA IN BRIEF, 2023, 50
  • [46] FRAME-WISE CNN-BASED VIEW SYNTHESIS FOR LIGHT FIELD CAMERA ARRAYS
    Schiopu, Ionut
    Alface, Patrice Rondao
    Munteanu, Adrian
    2019 INTERNATIONAL CONFERENCE ON 3D IMMERSION (IC3D), 2019,
  • [47] CRATI: Contrastive representation-based multimodal sound event localization and detection
    Wu, Shichao
    Wang, Yongru
    Jiang, Yushan
    Zhang, Qianyi
    Liu, Jingtai
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [48] Frame-wise detection of surgeon stress levels during laparoscopic training using kinematic data
    Zheng, Yi
    Leonard, Grey
    Zeh, Herbert
    Fey, Ann Majewicz
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2022, 17 (04) : 785 - 794
  • [49] SoundDet: Polyphonic Sound Event Detection and Localization from Raw Waveform
    He, Yuhang
    Trigoni, Niki
    Markham, Andrew
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [50] EVALUATION OF POST-PROCESSING ALGORITHMS FOR POLYPHONIC SOUND EVENT DETECTION
    Cances, Leo
    Guyot, Patrice
    Pellegrini, Thomas
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 318 - 322