Sound event localization and detection using element-wise attention gate and asymmetric convolutional recurrent neural networks

被引:0
|
作者
Yan, Lean [1 ]
Guo, Min [1 ]
Li, Zhiqiang [1 ]
机构
[1] Shaanxi Normal Univ, Sch Comp Sci, Minist Educ, Key Lab Modern Teaching Technol, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; asymmetric convolution; context gating; squeeze excitation; element-wise attention gate;
D O I
10.3233/AIC-220125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are problems that standard square convolution kernel has insufficient representation ability and recurrent neural network usually ignores the importance of different elements within an input vector in sound event localization and detection. This paper proposes an element-wise attention gate-asymmetric convolutional recurrent neural network (EleAttG-ACRNN), to improve the performance of sound event localization and detection. First, a convolutional neural network with context gating and asymmetric squeeze excitation residual is constructed, where asymmetric convolution enhances the capability of the square convolution kernel; squeeze excitation can improve the interdependence between channels; context gating can weight the important features and suppress the irrelevant features. Next, in order to improve the expressiveness of the model, we integrate the element-wise attention gate into the bidirectional gated recurrent network, which is to highlight the importance of different elements within an input vector, and further learn the temporal context information. Evaluation results using the TAU Spatial Sound Events 2019-Ambisonic dataset show the effectiveness of the proposed method, and it improves SELD performance up to 0.05 in error rate, 1.7% in F-score, 0.7 degrees in DOA error, and 4.5% in Frame recall compared to a CRNN method.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [31] SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks
    Guirguis, Karim
    Schorn, Christoph
    Guntoro, Andre
    Abdulatif, Sherif
    Yang, Bin
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 16 - 20
  • [32] RECURRENT NEURAL NETWORKS FOR POLYPHONIC SOUND EVENT DETECTION IN REAL LIFE RECORDINGS
    Parascandolo, Giambattista
    Huttunen, Heikki
    Virtanen, Tuomas
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6440 - 6444
  • [33] Polyphonic Sound Event Detection by Using Capsule Neural Networks
    Vesperini, Fabio
    Gabrielli, Leonardo
    Principi, Emanuele
    Squartini, Stefano
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 310 - 322
  • [34] End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input
    Cakir, Emre
    Virtanen, Tuomas
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [35] Detection and Localization of Ultrasound Scatterers Using Convolutional Neural Networks
    Youn, Jihwan
    Ommen, Martin Lind
    Stuart, Matthias Bo
    Thomsen, Erik Vilain
    Larsen, Niels Bent
    Jensen, Jorgen Arendt
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (12) : 3855 - 3867
  • [36] Angiodysplasia detection and localization using deep convolutional neural networks
    Shvets, Alexey A.
    Iglovikov, Vladimir I.
    Rakhlin, Alexander
    Kalinin, Alexandr A.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 612 - 617
  • [37] Simultaneous Object Detection and Localization using Convolutional Neural Networks
    Zahra Ouadiay, Fatima
    Bouftaih, Hamza
    Bouyakhf, El Houssine
    Majid Himmi, M.
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV2018), 2018,
  • [38] End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments
    Zinemanas, Pablo
    Cancela, Pablo
    Rocamora, Martin
    PROCEEDINGS OF THE 24TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2019, : 533 - 539
  • [39] Convolutional Recurrent Neural Networks for Urban Sound Classification using Raw Waveforms
    Sang, Jonghee
    Park, Soomyung
    Lee, Junwoo
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2444 - 2448
  • [40] Convolutional Neural Networks with Multi-task Loss for Polyphonic Sound Event Detection
    Liu, Huang
    Wang, Xiu
    Guan, Fa-Qian
    Hu, Jin-Sen
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,