Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

被引:12
|
作者
Zhu, Boqing [1 ]
Xu, Kele [1 ,2 ]
Wang, Dezhi [3 ]
Zhang, Lilun [3 ]
Li, Bo [4 ]
Peng, Yuxing [1 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Lab, Changsha, Hunan, Peoples R China
[2] Natl Univ Def Technol, Coll Informat Commun, Wuhan, Hubei, Peoples R China
[3] Natl Univ Def Technol, Coll Meteorol & Oceanog, Changsha, Hunan, Peoples R China
[4] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Audio scene classification; Multi-temporal resolution; Multi-level; Convolutional neural network;
D O I
10.1007/978-3-030-00767-6_49
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by the fact that characteristics of different sound classes are highly diverse in different temporal scales and hierarchical levels, a novel deep convolutional neural network (CNN) architecture is proposed for the environmental sound classification task. This network architecture takes raw waveforms as input, and a set of separated parallel CNNs are utilized with different convolutional filter sizes and strides, in order to learn feature representations with multi-temporal resolutions. On the other hand, the proposed architecture also aggregates hierarchical features from multi-level CNN layers for classification using direct connections between convolutional layers, which is beyond the typical single-level CNN features employed by the majority of previous studies. This network architecture also improves the flow of information and avoids vanishing gradient problem. The combination of multi-level features boosts the classification performance significantly. Comparative experiments are conducted on two datasets: the environmental sound classification dataset (ESC-50), and DCASE 2017 audio scene classification dataset. Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single-level features.
引用
收藏
页码:528 / 537
页数:10
相关论文
共 50 条
  • [1] DenseGCN: A multi-level and multi-temporal graph convolutional network for action recognition
    Yu, Chengzhang
    Bao, Wenxia
    [J]. IET IMAGE PROCESSING, 2023, 17 (12) : 3401 - 3410
  • [2] Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification
    Chong, Dading
    Zou, Yuexian
    Wang, Wenwu
    [J]. MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 157 - 168
  • [3] A MULTI-CHANNEL TEMPORAL ATTENTION CONVOLUTIONAL NEURAL NETWORK MODEL FOR ENVIRONMENTAL SOUND CLASSIFICATION
    Wang, You
    Feng, Chuyao
    Anderson, David, V
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 930 - 934
  • [4] A Multi-level Deep Convolutional Neural Network for Image Emotion Classification
    Wang W.
    Li L.
    Huang J.
    Luo J.
    Xu X.
    [J]. Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2019, 47 (06): : 39 - 50
  • [5] Multi-level region-based Convolutional Neural Network for image emotion classification
    Rao, Tianrong
    Li, Xiaoxu
    Zhang, Haimin
    Xu, Min
    [J]. NEUROCOMPUTING, 2019, 333 : 429 - 439
  • [6] Fusion and classification of multi-temporal SAR and optical imagery using convolutional neural network
    Shakya, Achala
    Biswas, Mantosh
    Pal, Mahesh
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND DATA FUSION, 2022, 13 (02) : 113 - 135
  • [7] Multi-level Resolution Features for Classification of Transportation Trajectories
    Macdonald, Aidan
    Ellen, Jeffrey
    [J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 713 - 718
  • [8] Acoustic scene classification with multi-temporal complex modulation spectrogram features and a convolutional LSTM network
    Sayeh Mirzaei
    Iman Khani Jazani
    [J]. Multimedia Tools and Applications, 2023, 82 : 16395 - 16408
  • [9] Acoustic scene classification with multi-temporal complex modulation spectrogram features and a convolutional LSTM network
    Mirzaei, Sayeh
    Jazani, Iman Khani
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16395 - 16408
  • [10] Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification
    Minze Li
    Wu Huang
    Tao Zhang
    [J]. Neural Processing Letters, 2023, 55 : 4291 - 4306