Evaluating metric and contrastive learning in pretrained models for environmental sound classification

被引:0
|
作者
Chen, Feilong [1 ]
Zhu, Zhenjun [1 ]
Sun, Chengli [1 ,2 ]
Xia, Linqing [3 ]
机构
[1] Nanchang Hangkong Univ, Sch Informat & Engn, Nanchang 330063, Peoples R China
[2] Guangzhou Maritime Univ, Sch Informat & Commun Engn, Guangzhou 510725, Peoples R China
[3] Shanghai Digiot Technol Co Ltd, Shanghai 200082, Peoples R China
基金
中国国家自然科学基金;
关键词
Environmental sound classification; Metric learning; Contrastive learning; Lightweight pretrained model; Unmanned aerial vehicle;
D O I
10.1016/j.apacoust.2025.110593
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Environmental Sound Classification (ESC) has advanced significantly with the advent of deep learning techniques. This study conducts a comprehensive evaluation of contrastive and metric learning approaches in ESC, introducing the ESC51 dataset, an extension of the ESC50 benchmark that incorporates noise samples from quadrotor Unmanned Aerial Vehicles (UAVs). To enhance classification performance and the discriminative power of embedding spaces, we propose a novel metric learning-based approach, SoundMLR, which employs a hybrid loss function emphasizing metric learning principles. Experimental results demonstrate that SoundMLR consistently outperforms contrastive learning methods in terms of classification accuracy and inference latency, particularly when applied to the lightweight MobileNetV2 pretrained model across ESC50, ESC51, and UrbanSound8K (US8K) datasets. Analyses of confusion matrices and t-SNE visualizations further highlight SoundMLR's ability to generate compact, distinct feature clusters, enabling more robust discrimination between sound classes. Additionally, we introduce two innovative modules, Spectral Pooling Attention (SPA) and the Feature Pooling Layer (FPL), designed to optimize the MobileNetV2 backbone. Notably, the MobileNetV2 + FPL model, equipped with SoundMLR, achieves an impressive 92.16 % classification accuracy on the ESC51 dataset while reducing computational complexity by 24.5 %. Similarly, the MobileNetV2 + SPA model achieves a peak accuracy of 91.75 % on the ESC50 dataset, showcasing the complementary strengths of these modules. These findings offer valuable insights for the future development of efficient, scalable, and robust ESC systems. The source code for this study is publicly available at https://github.com/flchenwhu/ESC-SoundMLR.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] METRIC LEARNING BASED DATA AUGMENTATION FOR ENVIRONMENTAL SOUND CLASSIFICATION
    Lu, Rui
    Duan, Zhiyao
    Zhang, Changshui
    2017 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2017, : 1 - 5
  • [2] Evaluating Pretrained models for Deployable Lifelong Learning
    Lekkala, Kiran
    Bhargava, Eshan
    Itti, Laurent
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 561 - 569
  • [3] Three Towers: Flexible Contrastive Learning with Pretrained Image Models
    Kossen, Jannik
    Collier, Mark
    Mustafa, Basil
    Wang, Xiao
    Zhai, Xiaohua
    Beyer, Lucas
    Steiner, Andreas
    Berent, Jesse
    Jenatton, Rodolphe
    Kokiopoulou, Efi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Evaluating Pretrained Deep Learning Models for Image Classification Against Individual and Ensemble Adversarial Attacks
    Rahman, Mafizur
    Roy, Prosenjit
    Frizell, Sherri S.
    Qian, Lijun
    IEEE ACCESS, 2025, 13 : 35230 - 35242
  • [5] CONTRASTIVE EMBEDDIND LEARNING METHOD FOR RESPIRATORY SOUND CLASSIFICATION
    Song, Wenjie
    Han, Jiqing
    Song, Hongwei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1275 - 1279
  • [6] TRANSLICO: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
    Liu, Yihong
    Ma, Chunlan
    Ye, Haotian
    Schuetze, Hinrich
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2476 - 2499
  • [7] RS Invariant Image Classification and Retrieval with Pretrained Deep Learning Models
    Hire, D. N.
    Patil, A. V.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 413 - 417
  • [8] Performance Comparison of Pretrained Deep Learning Models for Landfill Waste Classification
    Younis, Hussein
    Obaid, Mahmoud
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (11) : 689 - 698
  • [9] Using pretrained models in ensemble learning for date fruits multiclass classification
    Eser, Murat
    Bilgin, Metin
    Yasin, Elham Tahsin
    Koklu, Murat
    JOURNAL OF FOOD SCIENCE, 2025, 90 (03)
  • [10] Patch-level contrastive embedding learning for respiratory sound classification
    Song, Wenjie
    Han, Jiqing
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80