CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION

被引：1

作者：

Guan, Yadong ^{[1
]}

Han, Jiqing ^{[1
]}

Song, Hongwei ^{[1
]}

Song, Wenjie ^{[1
]}

Zheng, Guibin ^{[1
]}

Zheng, Tieran ^{[1
]}

He, Yongjun ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Polyphonic Sound Event Detection; Feature Disentanglement; Contrastive Loss;

D O I：

10.1109/ICASSP48485.2024.10447743

中图分类号：

学科分类号：

摘要：

Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.

引用

页码：1021 / 1025

页数：5

共 50 条

[1] Frame-wise dynamic threshold based polyphonic acoustic event detection
Xia, Xianjun
Togneri, Roberto
Sohel, Ferdous
Huang, David
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 474 - 478
[2] CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier
Tang, Tiantian
Zhou, Xinyuan
Long, Yanhua
Li, Yijie
Liang, Jiaen
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 939 - 944
[3] Polyphonic sound event localization and detection using channel-wise FusionNet
Spoorthy, V.
Kooolagudi, Shashidhar G.
APPLIED INTELLIGENCE, 2024, 54 (06) : 5015 - 5026
[4] Video Frame-wise Explanation Driven Contrastive Learning for Procedural Text Generation
Wang, Zhihao
Li, Lin
Xie, Zhongwei
Liu, Chuanbo
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
[5] A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
Hu, Jinbo
Cao, Yin
Wu, Ming
Kong, Qiuqiang
Yang, Feiran
Plumbley, Mark D.
Yang, Jun
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9196 - 9200
[6] A Capsule based Approach for Polyphonic Sound Event Detection
Liu, Yaming
Tang, Jian
Song, Yan
Dai, Lirong
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1853 - 1857
[7] Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
Chen, Minghao
Wei, Fangyun
Li, Chong
Cai, Deng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13791 - 13800
[8] Metrics for Polyphonic Sound Event Detection
Mesaros, Annamaria
Heittola, Toni
Virtanen, Tuomas
APPLIED SCIENCES-BASEL, 2016, 6 (06):
[9] Event Specific Attention for Polyphonic Sound Event Detection
Sundar, Harshavardhan
Sun, Ming
Wang, Chao
INTERSPEECH 2021, 2021, : 566 - 570
[10] Robust polyphonic sound event detection by using multi frame size denoising autoencoder
Zhou, Jianchao
Chen, Xiaoou
Yang, Deshun
2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,

← 1 2 3 4 5 →