Evaluating metric and contrastive learning in pretrained models for environmental sound classification

被引:0
|
作者
Chen, Feilong [1 ]
Zhu, Zhenjun [1 ]
Sun, Chengli [1 ,2 ]
Xia, Linqing [3 ]
机构
[1] Nanchang Hangkong Univ, Sch Informat & Engn, Nanchang 330063, Peoples R China
[2] Guangzhou Maritime Univ, Sch Informat & Commun Engn, Guangzhou 510725, Peoples R China
[3] Shanghai Digiot Technol Co Ltd, Shanghai 200082, Peoples R China
基金
中国国家自然科学基金;
关键词
Environmental sound classification; Metric learning; Contrastive learning; Lightweight pretrained model; Unmanned aerial vehicle;
D O I
10.1016/j.apacoust.2025.110593
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Environmental Sound Classification (ESC) has advanced significantly with the advent of deep learning techniques. This study conducts a comprehensive evaluation of contrastive and metric learning approaches in ESC, introducing the ESC51 dataset, an extension of the ESC50 benchmark that incorporates noise samples from quadrotor Unmanned Aerial Vehicles (UAVs). To enhance classification performance and the discriminative power of embedding spaces, we propose a novel metric learning-based approach, SoundMLR, which employs a hybrid loss function emphasizing metric learning principles. Experimental results demonstrate that SoundMLR consistently outperforms contrastive learning methods in terms of classification accuracy and inference latency, particularly when applied to the lightweight MobileNetV2 pretrained model across ESC50, ESC51, and UrbanSound8K (US8K) datasets. Analyses of confusion matrices and t-SNE visualizations further highlight SoundMLR's ability to generate compact, distinct feature clusters, enabling more robust discrimination between sound classes. Additionally, we introduce two innovative modules, Spectral Pooling Attention (SPA) and the Feature Pooling Layer (FPL), designed to optimize the MobileNetV2 backbone. Notably, the MobileNetV2 + FPL model, equipped with SoundMLR, achieves an impressive 92.16 % classification accuracy on the ESC51 dataset while reducing computational complexity by 24.5 %. Similarly, the MobileNetV2 + SPA model achieves a peak accuracy of 91.75 % on the ESC50 dataset, showcasing the complementary strengths of these modules. These findings offer valuable insights for the future development of efficient, scalable, and robust ESC systems. The source code for this study is publicly available at https://github.com/flchenwhu/ESC-SoundMLR.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Contrastive learning improves representation and transferability of diabetic retinopathy classification models
    Alam, Minhaj Nur
    Leng, Theodore
    Hallak, Joelle
    Rubin, Daniel
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2022, 63 (07)
  • [22] ESResNet: Environmental Sound Classification Based on Visual Domain Models
    Guzhov, Andrey
    Raue, Federico
    Hees, Jorn
    Dengel, Andreas
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4933 - 4940
  • [23] Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
    Bae, Sangmin
    Kim, June-Woo
    Cho, Won-Yang
    Baek, Hyerim
    Son, Soyoun
    Lee, Byungjo
    Ha, Changwan
    Tae, Kyongpil
    Kim, Sungnyun
    Yun, Se-Young
    INTERSPEECH 2023, 2023, : 5436 - 5440
  • [24] Combining frame and segment based models for environmental sound classification
    Hu, Pengfei
    Liu, Wenju
    Jiang, Wei
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2501 - 2504
  • [25] A Framework for Empirically Evaluating Pretrained Link Prediction Models
    Olivares, Emilio Sanchez
    Boekhout, Hanjo D.
    Saxena, Akrati
    Takes, Frank W.
    COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 1, COMPLEX NETWORKS 2023, 2024, 1141 : 150 - 161
  • [26] CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
    Zhou, Shuyan
    Alon, Uri
    Agarwal, Sumit
    Neubig, Graham
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13921 - 13937
  • [27] Contrastive Bayesian Analysis for Deep Metric Learning
    Kan, Shichao
    He, Zhiquan
    Cen, Yigang
    Li, Yang
    Mladenovic, Vladimir
    He, Zhihai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7220 - 7238
  • [28] Multiple-View Active Learning for Environmental Sound Classification
    Zhang, Yan
    Lv, Danjv
    Zhao, Yili
    INTERNATIONAL JOURNAL OF ONLINE ENGINEERING, 2016, 12 (12) : 49 - 54
  • [29] PreCLN: Pretrained-based contrastive learning network for vehicle trajectory prediction
    Bingqi Yan
    Geng Zhao
    Lexue Song
    Yanwei Yu
    Junyu Dong
    World Wide Web, 2023, 26 : 1853 - 1875
  • [30] A study of Turkish emotion classification with pretrained language models
    Ucan, Alaettin
    Dorterler, Murat
    Akcapinar Sezer, Ebru
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (06) : 857 - 865