Improved Patch-Mix Transformer and Contrastive Learning Method for Sound Classification in Noisy Environments

被引:0
|
作者
Chen, Xu [1 ]
Wang, Mei [1 ,2 ]
Kan, Ruixiang [3 ]
Qiu, Hongbing [3 ]
机构
[1] Guilin Univ Technol, Coll Comp Sci & Engn, Guilin 541006, Peoples R China
[2] Guilin Univ Technol, Coll Phys & Elect Informat Engn, Guilin 541006, Peoples R China
[3] Guilin Univ Elect Technol, Sch Informat & Commun, Guilin 541006, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期
基金
中国国家自然科学基金;
关键词
data augmentation; contrastive learning; feature fusion; deep learning; transformer; urban environmental sound recognition;
D O I
10.3390/app14219711
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In urban environments, noise significantly impacts daily life and presents challenges for Environmental Sound Classification (ESC). The structural influence of urban noise on audio signals complicates feature extraction and audio classification for environmental sound classification methods. To address these challenges, this paper proposes a Contrastive Learning-based Audio Spectrogram Transformer (CL-Transformer) that incorporates a Patch-Mix mechanism and adaptive contrastive learning strategies while simultaneously improving and utilizing adaptive data augmentation techniques for model training. Firstly, a combination of data augmentation techniques is introduced to enrich environmental sounds. Then, the Patch-Mix feature fusion scheme randomly mixes patches of the enhanced and noisy spectrograms during the Transformer's patch embedding. Furthermore, a novel contrastive learning scheme is introduced to quantify loss and improve model performance, synergizing well with the Transformer model. Finally, experiments on the ESC-50 and UrbanSound8K public datasets achieved accuracies of 97.75% and 92.95%, respectively. To simulate the impact of noise in real urban environments, the model is evaluated using the UrbanSound8K dataset with added background noise at different signal-to-noise ratios (SNR). Experimental results demonstrate that the proposed framework performs well in noisy environments.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
    Bae, Sangmin
    Kim, June-Woo
    Cho, Won-Yang
    Baek, Hyerim
    Son, Soyoun
    Lee, Byungjo
    Ha, Changwan
    Tae, Kyongpil
    Kim, Sungnyun
    Yun, Se-Young
    INTERSPEECH 2023, 2023, : 5436 - 5440
  • [2] Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
    Zhu, Jinjing
    Bai, Haotian
    Wang, Lin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3561 - 3571
  • [3] Patch-level contrastive embedding learning for respiratory sound classification
    Song, Wenjie
    Han, Jiqing
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
  • [4] CONTRASTIVE EMBEDDIND LEARNING METHOD FOR RESPIRATORY SOUND CLASSIFICATION
    Song, Wenjie
    Han, Jiqing
    Song, Hongwei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1275 - 1279
  • [5] A Framework Using Contrastive Learning for Classification with Noisy Labels
    Ciortan, Madalina
    Dupuis, Romain
    Peel, Thomas
    DATA, 2021, 6 (06)
  • [6] Contrastive Learning Based on Transformer for Hyperspectral Image Classification
    Hu, Xiang
    Li, Teng
    Zhou, Tong
    Liu, Yu
    Peng, Yuanxi
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [7] Vision Transformer With Contrastive Learning for Hyperspectral Image Classification
    Zhou, Heng
    Zhang, Xin
    Zhang, Chunlei
    Ma, Qiaoyu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [8] Evaluation of Denoising Algorithms for Footsteps Sound Classification in Noisy Environments
    Brenes-Jimenez, Carlos
    Caravaca-Mora, Ronald
    Coto-Jimenez, Marvin
    2021 3RD IEEE INTERNATIONAL CONFERENCE ON BIOINSPIRED PROCESSING (BIP): A CLEI COSTA RICA 2021 EVENT, 2021,
  • [9] Treatment Learning Causal Transformer for Noisy Image Classification
    Yang, Chao-Han Huck
    Hung, Danny I-Te
    Liu, Yi-Chieh
    Chen, Pin-Yu
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6128 - 6139
  • [10] CycleGuardian: a framework for automatic respiratory sound classification based on improved deep clustering and contrastive learning
    Chu, Yun
    Wang, Qiuhao
    Zhou, Enze
    Fu, Ling
    Liu, Qian
    Zheng, Gang
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)