CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION

被引:1
|
作者
Guan, Yadong [1 ]
Han, Jiqing [1 ]
Song, Hongwei [1 ]
Song, Wenjie [1 ]
Zheng, Guibin [1 ]
Zheng, Tieran [1 ]
He, Yongjun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Polyphonic Sound Event Detection; Feature Disentanglement; Contrastive Loss;
D O I
10.1109/ICASSP48485.2024.10447743
中图分类号
学科分类号
摘要
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.
引用
收藏
页码:1021 / 1025
页数:5
相关论文
共 50 条
  • [1] Frame-wise dynamic threshold based polyphonic acoustic event detection
    Xia, Xianjun
    Togneri, Roberto
    Sohel, Ferdous
    Huang, David
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 474 - 478
  • [2] CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier
    Tang, Tiantian
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    Liang, Jiaen
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 939 - 944
  • [3] Polyphonic sound event localization and detection using channel-wise FusionNet
    Spoorthy, V.
    Kooolagudi, Shashidhar G.
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5015 - 5026
  • [4] Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation
    Wang, Zhihao
    Li, Lin
    Xie, Zhongwei
    Liu, Chuanbo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [5] A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Hu, Jinbo
    Cao, Yin
    Wu, Ming
    Kong, Qiuqiang
    Yang, Feiran
    Plumbley, Mark D.
    Yang, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9196 - 9200
  • [6] A Capsule based Approach for Polyphonic Sound Event Detection
    Liu, Yaming
    Tang, Jian
    Song, Yan
    Dai, Lirong
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1853 - 1857
  • [7] Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
    Chen, Minghao
    Wei, Fangyun
    Li, Chong
    Cai, Deng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13791 - 13800
  • [8] Metrics for Polyphonic Sound Event Detection
    Mesaros, Annamaria
    Heittola, Toni
    Virtanen, Tuomas
    APPLIED SCIENCES-BASEL, 2016, 6 (06):
  • [9] Event Specific Attention for Polyphonic Sound Event Detection
    Sundar, Harshavardhan
    Sun, Ming
    Wang, Chao
    INTERSPEECH 2021, 2021, : 566 - 570
  • [10] Robust polyphonic sound event detection by using multi frame size denoising autoencoder
    Zhou, Jianchao
    Chen, Xiaoou
    Yang, Deshun
    2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,