Sound event localization and detection using element-wise attention gate and asymmetric convolutional recurrent neural networks

被引:0
|
作者
Yan, Lean [1 ]
Guo, Min [1 ]
Li, Zhiqiang [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Minist Educ, Key Lab Modern Teaching Technol, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; asymmetric convolution; context gating; squeeze excitation; element-wise attention gate;
D O I
10.3233/AIC-220125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7 degrees in DOA error, and 4.5% in Frame recall compared to a CRNN method.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [11] SOUND EVENT DETECTION USING SPATIAL FEATURES AND CONVOLUTIONAL RECURRENT NEURAL NETWORK
    Adavanne, Sharath
    Pertila, Pasi
    Virtanen, Tuomas
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 771 - 775
  • [12] POLYPHONIC SOUND EVENT DETECTION USING TRANSPOSED CONVOLUTIONAL RECURRENT NEURAL NETWORK
    Chatterjee, Chandra Churh
    Mulimani, Manjunath
    Koolagudi, Shashidhar G.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 661 - 665
  • [13] Convolutional Neural Network with Element-wise Filters to Extract Hierarchical Topological Features for Brain Networks
    Xing, Xinying
    Ji, Junzhong
    Yao, Yao
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 780 - 783
  • [14] Technical Sound Event Classification Applying Recurrent and Convolutional Neural Networks
    Rieder, Constantin
    Germann, Markus
    Mezger, Samuel
    Scherer, Klaus
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2020, : 84 - 88
  • [15] Fault Detection and Localization in Distributed Systems Using Recurrent Convolutional Neural Networks
    Qi, Guangyang
    Yao, Lina
    Uzunov, Anton V.
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 33 - 48
  • [16] Decoupling Temporal Convolutional Networks Model in Sound Event Detection and Localization
    Song, Shen
    Zhang, Cong
    You, Xinyuan
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (01): : 89 - 99
  • [17] Sound Event Detection via Conformer Recurrent Neural Networks
    Gao, Fangqing
    Li, Xin
    Wei, Xiukun
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4749 - 4754
  • [18] Relational recurrent neural networks for polyphonic sound event detection
    Junbo Ma
    Ruili Wang
    Wanting Ji
    Hao Zheng
    En Zhu
    Jianping Yin
    Multimedia Tools and Applications, 2019, 78 : 29509 - 29527
  • [19] Relational recurrent neural networks for polyphonic sound event detection
    Ma, Junbo
    Wang, Ruili
    Ji, Wanting
    Zheng, Hao
    Zhu, En
    Yin, Jianping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (20) : 29509 - 29527
  • [20] ROBUST SOUND EVENT RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS
    Zhang, Haomin
    McLoughlin, Ian
    Song, Yan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 559 - 563