Environment sound classification using an attention-based residual neural network

被引:30
|
作者
Tripathi, Achyut Mani [1 ]
Mishra, Aakansha [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Gauhati 781039, Assam, India
关键词
Attention mechanism; Convolutional neural network; Explainable; Environmental sound classification; Residual network; TEMPORAL RELATIONS; RECOGNITION;
D O I
10.1016/j.neucom.2021.06.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Complexity of environmental sounds impose numerous challenges for their classification. The performance of Environmental Sound Classification (ESC) depends greatly on how good the feature extraction technique employed to extract generic and prototypical features from a sound is. The presence of silent and semantically irrelevant frames is ubiquitous during the classification of environmental sounds. To deal with such issues that persist in environmental sound classification, we introduce a novel attention-based deep model that supports focusing on semantically relevant frames. The proposed attention guided deep model efficiently learns spatio-temporal relationships that exist in the spectrogram of a signal. The efficacy of the proposed method is evaluated on two widely used Environmental Sound Classification datasets: ESC-10 and DCASE 2019 Task-1(A) datasets. The experiments performed and their results demonstrate that the proposed method yields comparable performance to state-of-the-art techniques. We obtained improvements of 11.50% and 19.50% in accuracy as compared to the accuracy of the baseline models of the ESC-10 and DCASE 2019 Task-1(A) datasets respectively. To support the attention outcomes that have focused on relevant regions, visual analysis of the attention feature map has also been presented. The resultant attention feature map conveys that the model focuses only on the spectrogram's semantically relevant regions while skipping the irrelevant regions. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:409 / 423
页数:15
相关论文
共 50 条
  • [31] Environmental Sound Classification Based on Attention Feature Fusion and Improved Residual Network
    Yuxing Jinfang Zeng
    Mengjiao Liu
    Xin Wang
    Automatic Control and Computer Sciences, 2023, 57 : 371 - 379
  • [32] Temporal Self-Attention-Based Residual Network for An Environmental Sound Classification
    Tripathi, Achyut Mani
    Paul, Konark
    INTERSPEECH 2022, 2022, : 1516 - 1520
  • [33] Environmental sound classification using temporal-frequency attention based convolutional neural network
    Wenjie Mu
    Bo Yin
    Xianqing Huang
    Jiali Xu
    Zehua Du
    Scientific Reports, 11
  • [34] Environmental sound classification using temporal-frequency attention based convolutional neural network
    Mu, Wenjie
    Yin, Bo
    Huang, Xianqing
    Xu, Jiali
    Du, Zehua
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [35] Topical-Relevance Detection Using Attention-Based Neural Network
    Li, Xia
    Yang, Zhanyuan
    Chen, Minping
    Feng, Wenhe
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 373 - 377
  • [36] Mineral prospectivity mapping using attention-based convolutional neural network
    Li, Quanke
    Chen, Guoxiong
    Luo, Lei
    ORE GEOLOGY REVIEWS, 2023, 156
  • [37] A Sentence Summarizer using Recurrent Neural Network and Attention-Based Encoder
    Kuremoto, Takashi
    Tsuruda, Takuji
    Mabu, Shingo
    Obayashi, Masanao
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELING AND SIMULATION (AMMS 2017), 2017, 153 : 245 - 248
  • [38] Attention-based sentiment analysis using convolutional and recurrent neural network
    Usama, Mohd
    Ahmad, Belal
    Song, Enmin
    Hossain, M. Shamim
    Alrashoud, Mubarak
    Muhammad, Ghulam
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 113 : 571 - 578
  • [39] Attention-based Convolutional Neural Networks for Sentence Classification
    Zhao, Zhiwei
    Wu, Youzheng
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 705 - 709
  • [40] Attention-based convolutional neural network deep learning approach for robust malware classification
    Ravi, Vinayakumar
    Alazab, Mamoun
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (01) : 145 - 168