Multi-channel Convolutional Neural Networks with Multi-level Feature Fusion for Environmental Sound Classification

被引:7
|
作者
Chong, Dading [1 ]
Zou, Yuexian [1 ,2 ]
Wang, Wenwu [3 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
来源
关键词
Environmental sound classification; Multi-channel deep convolutional neural networks; End-to-end; Multi-level feature fusion;
D O I
10.1007/978-3-030-05716-9_13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning acoustic models directly from the raw waveform is an effective method for Environmental Sound Classification (ESC) where sound events often exhibit vast diversity in temporal scales. Convolutional neural networks (CNNs) based ESC methods have achieved the state-of-the-art results. However, their performance is affected significantly by the number of convolutional layers used and the choice of the kernel size in the first convolutional layer. In addition, most existing studies have ignored the ability of CNNs to learn hierarchical features from environmental sounds. Motivated by these findings, in this paper, parallel convolutional filters with different sizes in the first convolutional layer are designed to extract multi-time resolution features aiming at enhancing feature representation. Inspired by VGG networks, we build our deep CNNs by stacking 1-D convolutional layers using very small filters except for the first layer. Finally, we extend the model using multi-level feature aggregation technique to boost the classification performance. The experimental results on Urbansound 8k, ESC-50, and ESC-10 show that our proposed method outperforms the state-of-the-art end-to-end methods for environmental sound classification in terms of the classification accuracy.
引用
收藏
页码:157 / 168
页数:12
相关论文
共 50 条
  • [1] Multi-channel lung sound classification with convolutional recurrent neural networks
    Messner, Elmar
    Fediuk, Melanie
    Swatek, Paul
    Scheidl, Stefan
    Smolle-Juettner, Freyja-Maria
    Olschewski, Horst
    Pernkopf, Franz
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 122
  • [2] A MULTI-CHANNEL TEMPORAL ATTENTION CONVOLUTIONAL NEURAL NETWORK MODEL FOR ENVIRONMENTAL SOUND CLASSIFICATION
    Wang, You
    Feng, Chuyao
    Anderson, David, V
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 930 - 934
  • [3] Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
    Zhu, Boqing
    Xu, Kele
    Wang, Dezhi
    Zhang, Lilun
    Li, Bo
    Peng, Yuxing
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 528 - 537
  • [4] Multi-Level Wavelet Convolutional Neural Networks
    Liu, Pengju
    Zhang, Hongzhi
    Lian, Wei
    Zuo, Wangmeng
    [J]. IEEE ACCESS, 2019, 7 : 74973 - 74985
  • [5] Multi-level fusion with deep neural networks for multimodal sentiment classification
    Zhang Guangwei
    Zhao Bing
    Li Ruifan
    [J]. The Journal of China Universities of Posts and Telecommunications, 2022, 29 (03) : 25 - 33
  • [6] INVARIANT FEATURE EXTRACTION FOR IMAGE CLASSIFICATION VIA MULTI-CHANNEL CONVOLUTIONAL NEURAL NETWORK
    Mei, Shaohui
    Jiang, Ruoqiao
    Ji, Jingyu
    Sun, Jun
    Peng, Yang
    Zhang, Yifan
    [J]. 2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 491 - 495
  • [7] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    S. Siva Priyanka
    T. Kishore Kumar
    [J]. Signal, Image and Video Processing, 2023, 17 : 973 - 979
  • [8] Multi-channel speech enhancement using early and late fusion convolutional neural networks
    Priyanka, S. Siva
    Kumar, T. Kishore
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 973 - 979
  • [9] Multi-channel Convolutional Neural Network for Precise Meme Classification
    Sherratt, Victoria
    Pimbblet, Kevin
    Dethlefs, Nina
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 190 - 198
  • [10] Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification
    Soleymani, Sobhan
    Dabouei, Ali
    Kazemi, Hadi
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3469 - 3476