Environmental Sound Classification via Time-Frequency Attention and Framewise Self-Attention-Based Deep Neural Networks

被引:16
|
作者
Wu, Bo [1 ]
Zhang, Xiao-Ping [1 ]
机构
[1] Ryerson Univ, Dept Elect Comp & Biomed Engn, Toronto, ON M5B 2K3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Time-frequency analysis; Spectrogram; Clutter; Internet of Things; Deep learning; Feature extraction; Acoustics; Deep neural networks (DNNs); discriminative feature fusion; environmental sound; framewise self-attention; time-frequency attention;
D O I
10.1109/JIOT.2021.3098464
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Environmental sound classification (ESC) is crucial to understanding the surroundings in Internet of Things (IoT) applications. The state-of-the-art deep learning approaches do not have good ESC performance when there exists various clutter interference, which is common in IoT scenarios. In this article, we present a novel deep neural network framework based on time-frequency attention and framewise self-attention (TFFS-DNN). It consists of two major novel architectures: 1) gradient and 2) latent feature-based DNN to generate our time-frequency attention, which can locate the relevant time-frequency (i.e., spectral) features accurately, and self-attention normalization DNN to generate our framewise self-attentions which properly indicate the relevance of frames. By conjoining these two sorts of distinct and complementary attentions with spectrograms, we are able to identify the importance or relevance in terms of time, frequency, and frame of the sounds using TFFS-DNN, which helps in distinguishing clutter such as background as well as model interpretation to some extent. Thus, the proposed TFFS-DNN can classify environmental sounds with clutter. The evaluation using four real-world environmental sound data sets demonstrates the superior performance of the proposed framework over several state-of-the-art schemes. Notably, we achieve 79.23% classification accuracy on the UrbanSound data set, a raw environmental sound data set that is full of clutter. The ablation study demonstrates a relative 3%-9% improvement of classification accuracy by the proposed framework over the baseline deep model.
引用
收藏
页码:3416 / 3428
页数:13
相关论文
共 50 条
  • [1] Temporal Self-Attention-Based Residual Network for An Environmental Sound Classification
    Tripathi, Achyut Mani
    Paul, Konark
    [J]. INTERSPEECH 2022, 2022, : 1516 - 1520
  • [2] Self-attention-based convolutional neural network and time-frequency common spatial pattern for enhanced motor imagery classification
    Zhang, Rui
    Liu, Guoyang
    Wen, Yiming
    Zhou, Weidong
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2023, 398
  • [3] A Self-Attention-Based Deep Convolutional Neural Networks for IIoT Networks Intrusion Detection
    Alshehri, Mohammed S.
    Saidani, Oumaima
    Alrayes, Fatma S.
    Abbasi, Saadullah Farooq
    Ahmad, Jawad
    [J]. IEEE ACCESS, 2024, 12 : 45762 - 45772
  • [4] Deep attention-based neural networks for explainable heart sound classification
    Ren, Zhao
    Qian, Kun
    Dong, Fengquan
    Dai, Zhenyu
    Nejdl, Wolfgang
    Yamamoto, Yoshiharu
    Schuller, Bjoern W.
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [5] Self-attention-based neural networks for refining the overlength product titles
    Yuming Lin
    Yu Fu
    You Li
    Guoyong Cai
    Aoying Zhou
    [J]. Multimedia Tools and Applications, 2021, 80 : 28501 - 28519
  • [6] Self-attention-based neural networks for refining the overlength product titles
    Lin, Yuming
    Fu, Yu
    Li, You
    Cai, Guoyong
    Zhou, Aoying
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (18) : 28501 - 28519
  • [7] Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification
    Cao, Ran
    Fang, Leyuan
    Lu, Ting
    He, Nanjun
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (01) : 43 - 47
  • [8] Environmental Sound Classification based on Time-frequency Representation
    Thwe, Khine Zar
    War, Nu
    [J]. 2017 18TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNDP 2017), 2017, : 251 - 255
  • [9] Automated vein verification using self-attention-based convolutional neural networks
    Kocakulak, Mustafa
    Avci, Adem
    Acir, Nurettin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [10] An Attention-Based Time-Frequency Pyramid Pooling Strategy in Deep Convolutional Networks for Acoustic Scene Classification
    Jiang, Pengxu
    Yang, Yang
    Zou, Cairong
    Wang, Qingyun
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 296 - 300