Multi-rate modulation encoding via unsupervised learning for audio event detection

被引:1
|
作者
Kothinti, Sandeep Reddy [1 ]
Elhilali, Mounya [1 ]
机构
[1] Johns Hopkins Univ, Lab Computat Auditory Percept, Baltimore, MD 21218 USA
关键词
Audio event detection; Multi-rate processing; Temporal contrastive loss; Unsupervised learning; Variational autoencoder; TEMPORAL COHERENCE;
D O I
10.1186/s13636-024-00339-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily on supervised or semi-supervised models to capture the wide range of dynamics spanned by sound events in order to achieve temporally precise boundaries and accurate event classification. These methods require extensive collections of labeled or weakly labeled in-domain data, which is costly and labor-intensive. Importantly, these approaches do not fully leverage the inherent variability and range of dynamics across sound events, aspects that can be effectively identified through unsupervised methods. The present work proposes an approach based on multi-rate autoencoders that are pretrained in an unsupervised way to leverage unlabeled audio data and ultimately learn the rich temporal dynamics inherent in natural sound events. This approach utilizes parallel autoencoders that achieve decompositions of the modulation spectrum along different bands. In addition, we introduce a rate-selective temporal contrastive loss to align the training objective with event detection metrics. Optimizing the configuration of multi-rate encoders and the temporal contrastive loss leads to notable improvements in domestic sound event detection in the context of the DCASE challenge.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Multi-rate modulation encoding via unsupervised learning for audio event detection
    Sandeep Reddy Kothinti
    Mounya Elhilali
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2024
  • [2] SOUND EVENT DETECTION IN URBAN AUDIO WITH SINGLE AND MULTI-RATE PCEN
    Ick, Christopher
    McFee, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 880 - 884
  • [3] UNSUPERVISED DISCRIMINATIVE LEARNING OF SOUNDS FOR AUDIO EVENT CLASSIFICATION
    Hornauer, Sascha
    Li, Ke
    Yu, Stella X.
    Ghaffarzadegan, Shabnam
    Ren, Liu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3035 - 3039
  • [4] Fast Multi-Rate Encoding for Adaptive HTTP Streaming
    Amirpour, Hadi
    Cetinkaya, Ekrem
    Timmerer, Christian
    Ghanbari, Mohammad
    [J]. 2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 358 - 358
  • [5] Multi-rate encoding of a video sequence in the DCT domain
    Zaccarin, A
    Yeo, BL
    [J]. 2002 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, PROCEEDINGS, 2002, : 680 - 683
  • [6] Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning
    Cetinkaya, Ekrem
    Amirpour, Hadi
    Timmerer, Christian
    Ghanbari, Mohammad
    [J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2021, 2 : 484 - 495
  • [7] Minimum Transmission Time Encoding in Multi-rate Wireless Networks
    Wang, Qingshan
    Xu, Yinlong
    Wang, Qi
    Guo, Qingwei
    [J]. PROCEEDINGS OF THE 2009 PACIFIC-ASIA CONFERENCE ON CIRCUITS, COMMUNICATIONS AND SYSTEM, 2009, : 194 - +
  • [8] Multi-Rate Deep Learning for Temporal Recommendation
    Song, Yang
    Elkahky, Ali Mamdouh
    He, Xiaodong
    [J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 909 - 912
  • [9] Multiuser detection for integrated multi-rate CDMA
    Ge, HY
    [J]. ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 858 - 862
  • [10] An unsupervised learning approach to musical event detection
    Gao, S
    Lee, CH
    Zhu, YW
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1307 - 1310