Evaluating metric and contrastive learning in pretrained models for environmental sound classification

被引:0
|
作者
Chen, Feilong [1 ]
Zhu, Zhenjun [1 ]
Sun, Chengli [1 ,2 ]
Xia, Linqing [3 ]
机构
[1] Nanchang Hangkong Univ, Sch Informat & Engn, Nanchang 330063, Peoples R China
[2] Guangzhou Maritime Univ, Sch Informat & Commun Engn, Guangzhou 510725, Peoples R China
[3] Shanghai Digiot Technol Co Ltd, Shanghai 200082, Peoples R China
基金
中国国家自然科学基金;
关键词
Environmental sound classification; Metric learning; Contrastive learning; Lightweight pretrained model; Unmanned aerial vehicle;
D O I
10.1016/j.apacoust.2025.110593
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Environmental Sound Classification (ESC) has advanced significantly with the advent of deep learning techniques. This study conducts a comprehensive evaluation of contrastive and metric learning approaches in ESC, introducing the ESC51 dataset, an extension of the ESC50 benchmark that incorporates noise samples from quadrotor Unmanned Aerial Vehicles (UAVs). To enhance classification performance and the discriminative power of embedding spaces, we propose a novel metric learning-based approach, SoundMLR, which employs a hybrid loss function emphasizing metric learning principles. Experimental results demonstrate that SoundMLR consistently outperforms contrastive learning methods in terms of classification accuracy and inference latency, particularly when applied to the lightweight MobileNetV2 pretrained model across ESC50, ESC51, and UrbanSound8K (US8K) datasets. Analyses of confusion matrices and t-SNE visualizations further highlight SoundMLR's ability to generate compact, distinct feature clusters, enabling more robust discrimination between sound classes. Additionally, we introduce two innovative modules, Spectral Pooling Attention (SPA) and the Feature Pooling Layer (FPL), designed to optimize the MobileNetV2 backbone. Notably, the MobileNetV2 + FPL model, equipped with SoundMLR, achieves an impressive 92.16 % classification accuracy on the ESC51 dataset while reducing computational complexity by 24.5 %. Similarly, the MobileNetV2 + SPA model achieves a peak accuracy of 91.75 % on the ESC50 dataset, showcasing the complementary strengths of these modules. These findings offer valuable insights for the future development of efficient, scalable, and robust ESC systems. The source code for this study is publicly available at https://github.com/flchenwhu/ESC-SoundMLR.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Label contrastive learning for image classification
    Yang, Han
    Li, Jun
    SOFT COMPUTING, 2023, 27 (18) : 13477 - 13486
  • [42] Prototypical contrastive learning for image classification
    Han Yang
    Jun Li
    Cluster Computing, 2024, 27 : 2059 - 2069
  • [43] Prototypical contrastive learning for image classification
    Yang, Han
    Li, Jun
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (02): : 2059 - 2069
  • [44] Supervised Contrastive Learning for Product Classification
    Azizi, Sahel
    Fang, Uno
    Adibi, Sasan
    Li, Jianxin
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2021, PT II, 2022, 13088 : 341 - 355
  • [45] Contrastive Representation Learning for Electroencephalogram Classification
    Mohsenvand, Mostafa 'Neo'
    Izadi, Mohammad Rasool
    Maes, Pattie
    MACHINE LEARNING FOR HEALTH, VOL 136, 2020, 136 : 238 - 253
  • [46] Transfer Learning in Classification based on Manifold Models and its Relation to Tangent Metric Learning
    Saralajew, Sascha
    Villmann, Thomas
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1756 - 1765
  • [47] Evaluating vertical localization performance of 3D sound rendering models with a perceptual metric
    Geronazzo, Michele
    Carraro, Andrea
    Avanzini, Federico
    2015 IEEE 2ND VR WORKSHOP ON SONIC INTERACTIONS FOR VIRTUAL ENVIRONMENTS (SIVE), 2015, : 53 - 57
  • [48] Deep Convolutional Neural Network with Transfer Learning for Environmental Sound Classification
    Lu, Jianrui
    Ma, Ruofei
    Liu, Gongliang
    Qin, Zhiliang
    2021 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS (ICCCR 2021), 2021, : 242 - 245
  • [49] Practical Takes on Federated Learning with Pretrained Language Models
    Agarwal, Ankur
    Rezagholizadeh, Mehdi
    Parthasarathi, Prasanna
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 454 - 471
  • [50] Few-Shot Contrastive Transfer Learning With Pretrained Model for Masked Face Verification
    Weng, Zhenyu
    Zhuang, Huiping
    Luo, Fulin
    Li, Haizhou
    Lin, Zhiping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3871 - 3883