Sound Event Localization and Detection Using Parallel Multi-attention Enhancement

被引:1
|
作者
Chen, Zhengyu [1 ]
Huang, Qinghua [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; Parallel multi-attention; Global information; Feature fusion; DEEP NEURAL-NETWORKS; RECOGNITION;
D O I
10.1007/s00034-023-02489-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As a combination of sound event detection and direction of arrival, the joint task of sound event localization and detection (SELD) is an emerging audio signal processing task and is applied in many areas widely. A popular convolutional recurrent neural network (CRNN)-based method uses convolution neural network (CNN) to extract high-level space features from manually designed features and utilizes recurrent neural network to model sequence context information. Some studies have shown that the normal CNN could not be robust in challenging acoustic environments such as overlapping, moving and discontinuous sources. To improve the performance of SELD in more complex acoustic scenes, parallel multi-attention enhancement (PMAE) is proposed as a convolution enhancement method to boost the representation ability of CNN in this paper. PMAE consists of attention feature enhancement (AFE) and parallel multi-attention (PMA) block. PMA, embedded into AFE, extracts boosting global-local features by efficient attention modules along with different dimensions. AFE, as a feature fusion strategy, fuses multi-scale enhanced features to improve feature representation. AFE shows great performance for overlapping sources. PMA adequately extracts characteristic information of different sound events and shows better performance on moving and discontinuous sources when it is combined with AFE. Based on such a framework, the SELD system becomes robust, while the target sources are moving and overlapping with unknown interference classes. The simulations show that proposed PMAE improves the performance enormously for SELD without other technologies, such as data augment and post-processing.
引用
下载
收藏
页码:545 / 567
页数:23
相关论文
共 50 条
  • [11] MULTI-ATTENTION GHOSTNET FOR DEFORESTATION DETECTION IN THE AMAZON RAINFOREST
    Adarme, M. X. Ortega
    Costa, G. A. O. P.
    Feitosa, R. Q.
    XXIV ISPRS CONGRESS: IMAGING TODAY, FORESEEING TOMORROW, COMMISSION III, 2022, 5-3 : 657 - 664
  • [12] Event Extraction with Deep Contextualized Word Representation and Multi-attention Layer
    Ding, Ruixue
    Li, Zhoujun
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 189 - 201
  • [13] Attention mechanism combined with residual recurrent neural network for sound event detection and localization
    Lan, Chaofeng
    Zhang, Lei
    Zhang, Yuanyuan
    Fu, Lirong
    Sun, Chao
    Han, Yulan
    Zhang, Meng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [14] Attention mechanism combined with residual recurrent neural network for sound event detection and localization
    Chaofeng Lan
    Lei Zhang
    Yuanyuan Zhang
    Lirong Fu
    Chao Sun
    Yulan Han
    Meng Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [15] Single image deraining via a recurrent multi-attention enhancement network
    Liu, Yuetong
    Zhang, Rui
    Zhang, Yunfeng
    Yao, Xunxiang
    Han, Huijian
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 113
  • [16] Sound Event Localization and Detection Using Imbalanced Real and Synthetic Data via Multi-Generator
    Shin, Yeongseo
    Chun, Chanjun
    SENSORS, 2023, 23 (07)
  • [17] Hybrid multi-attention transformer for robust video object detection
    Moorthy, Sathishkumar
    K.S., Sachin Sakthi
    Arthanari, Sathiyamoorthi
    Jeong, Jae Hoon
    Joo, Young Hoon
    Engineering Applications of Artificial Intelligence, 2025, 139
  • [18] Video Captioning using Hierarchical Multi-Attention Model
    Xiao, Huanhou
    Shi, Jinglun
    ICAIP 2018: 2018 THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN IMAGE PROCESSING, 2018, : 96 - 101
  • [19] Joint Learning With BERT-GCN and Multi-Attention for Event Text Classification and Event Assignment
    She, Xiangrong
    Chen, Jianpeng
    Chen, Gang
    IEEE ACCESS, 2022, 10 : 27031 - 27040
  • [20] DEEPFAKE SATELLITE IMAGERY DETECTION WITH MULTI-ATTENTION AND SUPER RESOLUTION
    Ciftci, Umur Aybars
    Demir, Ilke
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4871 - 4874