A MULTI-CHANNEL TEMPORAL ATTENTION CONVOLUTIONAL NEURAL NETWORK MODEL FOR ENVIRONMENTAL SOUND CLASSIFICATION

被引:24
|
作者
Wang, You [1 ]
Feng, Chuyao [1 ]
Anderson, David, V [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
Environmental sound classification; convolutional neural network; temporal attention; multi-channel;
D O I
10.1109/ICASSP39728.2021.9413498
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, many attention-based deep neural networks have emerged and achieved state-of-the-art performance in environmental sound classification. The essence of attention mechanism is assigning contribution weights on different parts of features, namely channels, spectral or spatial contents, and temporal frames. In this paper, we propose an effective convolutional neural network structure with a multi-channel temporal attention (MCTA) block, which applies a temporal attention mechanism within each channel of the embedded features to extract channel-wise relevant temporal information. This multi-channel temporal attention structure will result in a distinct attention vector for each channel, which enables the network to fully exploit the relevant temporal information in different channels. The datasets used to test our model include ESC-50 and its subset ESC-10, along with development sets of DCASE 2018 and 2019. In our experiments, MCTA performed better than the single-channel temporal attention model and the non-attention model with the same number of parameters. Furthermore, we compared our model with some successful attention-based models and obtained competitive results with a relatively lighter network.
引用
收藏
页码:930 / 934
页数:5
相关论文
共 50 条
  • [21] Multi-channel and multi-scale separable dilated convolutional neural network with attention mechanism for flue-cured tobacco classification
    Xu, Ming
    Gao, Jinfeng
    Zhang, Zhong
    Guo, Xin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (21): : 15511 - 15529
  • [22] Multi-channel and multi-scale separable dilated convolutional neural network with attention mechanism for flue-cured tobacco classification
    Ming Xu
    Jinfeng Gao
    Zhong Zhang
    Xin Guo
    [J]. Neural Computing and Applications, 2023, 35 : 15511 - 15529
  • [23] A multi-channel convolutional neural network based on attention mechanism fusion for facial expression recognition
    Zhu, Muqing
    Wen, Mi
    [J]. APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2023, 9 (01)
  • [24] Knowledge Graph Embedding Using a Multi-Channel Interactive Convolutional Neural Network with Triple Attention
    Shi, Lin
    Liu, Weitao
    Wu, Yafeng
    Dai, Chenxu
    Ji, Zhanlin
    Ganchev, Ivan
    [J]. MATHEMATICS, 2024, 12 (18)
  • [25] Deep Convolutional Neural Network with Mixup for Environmental Sound Classification
    Zhang, Zhichao
    Xu, Shugong
    Cao, Shan
    Zhang, Shunqing
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT II, 2018, 11257 : 356 - 367
  • [26] Fire Recognition Based On Multi-Channel Convolutional Neural Network
    Mao, Wentao
    Wang, Wenpeng
    Dou, Zhi
    Li, Yuan
    [J]. FIRE TECHNOLOGY, 2018, 54 (02) : 531 - 554
  • [27] Multi-channel Convolutional Neural Network Ensemble for Pedestrian Detection
    Ribeiro, David
    Carneiro, Gustavo
    Nascimento, Jacinto C.
    Bernardino, Alexandre
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 122 - 130
  • [28] Haptic Material Classification with a Multi-Channel Neural Network
    Kerzel, Matthias
    Ali, Moaaz
    Ng, Hwei Geok
    Wermter, Stefan
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 439 - 446
  • [29] Fire Recognition Based On Multi-Channel Convolutional Neural Network
    Wentao Mao
    Wenpeng Wang
    Zhi Dou
    Yuan Li
    [J]. Fire Technology, 2018, 54 : 531 - 554
  • [30] Mixup-Based Acoustic Scene Classification Using Multi-channel Convolutional Neural Network
    Xu, Kele
    Feng, Dawei
    Mi, Haibo
    Zhu, Boqing
    Wang, Dezhi
    Zhang, Lilun
    Cai, Hengxing
    Liu, Shuwen
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 14 - 23