RCT: Random Consistency Training for Semi-Supervised Sound Event Detection

被引：1

作者：

Shao, Nian ^{[1
,2
]}

Loweimi, Erfan ^{[3
]}

Li, Xiaofei ^{[1
,2
]}

机构：

[1] Westlake Univ, Hangzhou, Peoples R China

[2] Westlake Inst Adv Study, Hangzhou, Peoples R China

[3] Univ Edinburgh, CSTR, Edinburgh, Midlothian, Scotland

来源：

INTERSPEECH 2022 | 2022年

关键词：

semi-supervised learning; sound event detection; data augmentation; consistency regularization; hard mixup;

D O I：

10.21437/Interspeech.2022-10037

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem. This paper researches on several core modules of SSL, and introduces a random consistency training (RCT) strategy. First, a hard mixup data augmentation is proposed to account for the additive property of sounds. Second, a random augmentation scheme is applied to stochastically combine different types of data augmentation methods with high flexibility. Third, a self-consistency loss is proposed to be fused with the teacher-student model, aiming at stabilizing the training. Performance-wise, the proposed modules outperform their respective competitors, and as a whole the proposed SED strategies achieve 44.0% and 67.1% in terms of the PSDS1 and PSDS2 metrics proposed by the DCASE challenge, which notably outperforms other widely-used alternatives.

引用

页码：1541 / 1545

页数：5

共 50 条

[1] Resolution Consistency Training on Time-Frequency Domain for Semi-Supervised Sound Event Detection
Choi, Won-Gook
Chang, Joon-Hyuk
[J]. INTERSPEECH 2023, 2023, : 286 - 290
[2] Semi-Supervised Sound Event Detection Using Self-Attention and Multiple Techniques of Consistency Training
Wang, Yih-Wen
Chen, Chia-Ping
Lu, Chung-Li
Chan, Bo-Cheng
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 269 - 274
[3] COUPLE LEARNING FOR SEMI-SUPERVISED SOUND EVENT DETECTION
Tao, Rui
Yan, Long
Ouchi, Kazushige
Wang, Xiangdong
[J]. INTERSPEECH 2022, 2022, : 2398 - 2402
[4] Semi-Supervised NMF-CNN for Sound Event Detection
Chan, Teck Kai
Chin, Cheng Siong
Li, Ye
[J]. IEEE ACCESS, 2021, 9 : 130529 - 130542
[5] On Local Temporal Embedding for Semi-Supervised Sound Event Detection
Gao, Lijian
Mao, Qirong
Dong, Ming
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1687 - 1698
[6] Interpolation Consistency Training for Semi-Supervised Learning
Verma, Vikas
Lamb, Alex
Kannala, Juho
Bengio, Yoshua
Lopez-Paz, David
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3635 - 3641
[7] Interpolation consistency training for semi-supervised learning
Verma, Vikas
Kawaguchi, Kenji
Lamb, Alex
Kannala, Juho
Solin, Arno
Bengio, Yoshua
Lopez-Paz, David
[J]. NEURAL NETWORKS, 2022, 145 : 90 - 106
[8] Regression-based Sound Event Detection with Semi-supervised Learning
Liu, Chia-Chuan
Chen, Chia-Ping
Lu, Chung-Li
Chan, Bo-cheng
Cheng, Yu-Han
Chuang, Hsiang-Feng
Chen, Wei-Yu
[J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2336 - 2342
[9] SPARSE SELF-ATTENTION FOR SEMI-SUPERVISED SOUND EVENT DETECTION
Guan, Yadong
Xue, Jiabin
Zheng, Guibin
Han, Jiqing
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 821 - 825
[10] PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection
Li, Gang
Li, Xiang
Wang, Yujie
Wu, Yichao
Liang, Ding
Zhang, Shanshan
[J]. COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 457 - 472

← 1 2 3 4 5 →