Evaluating metric and contrastive learning in pretrained models for environmental sound classification

被引:0
|
作者
Chen, Feilong [1 ]
Zhu, Zhenjun [1 ]
Sun, Chengli [1 ,2 ]
Xia, Linqing [3 ]
机构
[1] Nanchang Hangkong Univ, Sch Informat & Engn, Nanchang 330063, Peoples R China
[2] Guangzhou Maritime Univ, Sch Informat & Commun Engn, Guangzhou 510725, Peoples R China
[3] Shanghai Digiot Technol Co Ltd, Shanghai 200082, Peoples R China
基金
中国国家自然科学基金;
关键词
Environmental sound classification; Metric learning; Contrastive learning; Lightweight pretrained model; Unmanned aerial vehicle;
D O I
10.1016/j.apacoust.2025.110593
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Environmental Sound Classification (ESC) has advanced significantly with the advent of deep learning techniques. This study conducts a comprehensive evaluation of contrastive and metric learning approaches in ESC, introducing the ESC51 dataset, an extension of the ESC50 benchmark that incorporates noise samples from quadrotor Unmanned Aerial Vehicles (UAVs). To enhance classification performance and the discriminative power of embedding spaces, we propose a novel metric learning-based approach, SoundMLR, which employs a hybrid loss function emphasizing metric learning principles. Experimental results demonstrate that SoundMLR consistently outperforms contrastive learning methods in terms of classification accuracy and inference latency, particularly when applied to the lightweight MobileNetV2 pretrained model across ESC50, ESC51, and UrbanSound8K (US8K) datasets. Analyses of confusion matrices and t-SNE visualizations further highlight SoundMLR's ability to generate compact, distinct feature clusters, enabling more robust discrimination between sound classes. Additionally, we introduce two innovative modules, Spectral Pooling Attention (SPA) and the Feature Pooling Layer (FPL), designed to optimize the MobileNetV2 backbone. Notably, the MobileNetV2 + FPL model, equipped with SoundMLR, achieves an impressive 92.16 % classification accuracy on the ESC51 dataset while reducing computational complexity by 24.5 %. Similarly, the MobileNetV2 + SPA model achieves a peak accuracy of 91.75 % on the ESC50 dataset, showcasing the complementary strengths of these modules. These findings offer valuable insights for the future development of efficient, scalable, and robust ESC systems. The source code for this study is publicly available at https://github.com/flchenwhu/ESC-SoundMLR.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Topic Classification for Political Texts with Pretrained Language Models
    Wang, Yu
    POLITICAL ANALYSIS, 2023, 31 (04) : 662 - 668
  • [32] PRCBERT: Prompt Learning for Requirement Classification using BERT-based Pretrained Language Models
    Luo, Xianchang
    Xue, Yinxing
    Xing, Zhenchang
    Sun, Jiamou
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [33] Multi-Class Breast Cancer Classification Using Ensemble of Pretrained models and Transfer Learning
    Rao, Perumalla Murali Mallikarjuna
    Singh, Sanjay Kumar
    Khamparia, Aditya
    Bhushan, Bharat
    Podder, Prajoy
    CURRENT MEDICAL IMAGING, 2022, 18 (04) : 409 - 416
  • [34] PreCLN: Pretrained-based contrastive learning network for vehicle trajectory prediction
    Yan, Bingqi
    Zhao, Geng
    Song, Lexue
    Yu, Yanwei
    Dong, Junyu
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (04): : 1853 - 1875
  • [35] CycleGuardian: a framework for automatic respiratory sound classification based on improved deep clustering and contrastive learning
    Chu, Yun
    Wang, Qiuhao
    Zhou, Enze
    Fu, Ling
    Liu, Qian
    Zheng, Gang
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [36] Improved Patch-Mix Transformer and Contrastive Learning Method for Sound Classification in Noisy Environments
    Chen, Xu
    Wang, Mei
    Kan, Ruixiang
    Qiu, Hongbing
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [37] UNSUPERVISED CONTRASTIVE LEARNING OF SOUND EVENT REPRESENTATIONS
    Fonseca, Eduardo
    Ortego, Diego
    McGuinness, Kevin
    O'Connor, Noel E.
    Serra, Xavier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 371 - 375
  • [38] When Deep is not Enough: Towards Understanding Shallow and Continual Learning Models in Realistic Environmental Sound Classification for Robots
    Eldardeer, Omar
    Rea, Francesco
    Sandini, Giulio
    Jirak, Doreen
    INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2023, 20 (05)
  • [39] Contrastive Learning for View Classification of Echocardiograms
    Chartsias, Agisilaos
    Gao, Shan
    Mumith, Angela
    Oliveira, Jorge
    Bhatia, Kanwal
    Kainz, Bernhard
    Beqiri, Arian
    SIMPLIFYING MEDICAL ULTRASOUND, 2021, 12967 : 149 - 158
  • [40] Label contrastive learning for image classification
    Han Yang
    Jun Li
    Soft Computing, 2023, 27 : 13477 - 13486