Multi-rate modulation encoding via unsupervised learning for audio event detection

被引：1

作者：

Kothinti, Sandeep Reddy ^{[1
]}

Elhilali, Mounya ^{[1
]}

机构：

[1] Johns Hopkins Univ, Lab Computat Auditory Percept, Baltimore, MD 21218 USA

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2024年 / 2024卷 / 01期

关键词：

Audio event detection; Multi-rate processing; Temporal contrastive loss; Unsupervised learning; Variational autoencoder; TEMPORAL COHERENCE;

D O I：

10.1186/s13636-024-00339-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily on supervised or semi-supervised models to capture the wide range of dynamics spanned by sound events in order to achieve temporally precise boundaries and accurate event classification. These methods require extensive collections of labeled or weakly labeled in-domain data, which is costly and labor-intensive. Importantly, these approaches do not fully leverage the inherent variability and range of dynamics across sound events, aspects that can be effectively identified through unsupervised methods. The present work proposes an approach based on multi-rate autoencoders that are pretrained in an unsupervised way to leverage unlabeled audio data and ultimately learn the rich temporal dynamics inherent in natural sound events. This approach utilizes parallel autoencoders that achieve decompositions of the modulation spectrum along different bands. In addition, we introduce a rate-selective temporal contrastive loss to align the training objective with event detection metrics. Optimizing the configuration of multi-rate encoders and the temporal contrastive loss leads to notable improvements in domestic sound event detection in the context of the DCASE challenge.

引用

页数：13

共 50 条

[1] Multi-rate modulation encoding via unsupervised learning for audio event detection
Sandeep Reddy Kothinti
Mounya Elhilali
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2024
[2] SOUND EVENT DETECTION IN URBAN AUDIO WITH SINGLE AND MULTI-RATE PCEN
Ick, Christopher
McFee, Brian
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 880 - 884
[3] UNSUPERVISED DISCRIMINATIVE LEARNING OF SOUNDS FOR AUDIO EVENT CLASSIFICATION
Hornauer, Sascha
Li, Ke
Yu, Stella X.
Ghaffarzadegan, Shabnam
Ren, Liu
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3035 - 3039
[4] Fast Multi-Rate Encoding for Adaptive HTTP Streaming
Amirpour, Hadi
Cetinkaya, Ekrem
Timmerer, Christian
Ghanbari, Mohammad
[J]. 2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 358 - 358
[5] Multi-rate encoding of a video sequence in the DCT domain
Zaccarin, A
Yeo, BL
[J]. 2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 680 - 683
[6] Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning
Cetinkaya, Ekrem
Amirpour, Hadi
Timmerer, Christian
Ghanbari, Mohammad
[J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2021, 2 : 484 - 495
[7] Minimum Transmission Time Encoding in Multi-rate Wireless Networks
Wang, Qingshan
Xu, Yinlong
Wang, Qi
Guo, Qingwei
[J]. PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 194 - +
[8] Multi-Rate Deep Learning for Temporal Recommendation
Song, Yang
Elkahky, Ali Mamdouh
He, Xiaodong
[J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 909 - 912
[9] Multiuser detection for integrated multi-rate CDMA
Ge, HY
[J]. ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 858 - 862
[10] An unsupervised learning approach to musical event detection
Gao, S
Lee, CH
Zhu, YW
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1307 - 1310

← 1 2 3 4 5 →