Sound Event Localization and Detection Using Parallel Multi-attention Enhancement

被引:1
|
作者
Chen, Zhengyu [1 ]
Huang, Qinghua [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; Parallel multi-attention; Global information; Feature fusion; DEEP NEURAL-NETWORKS; RECOGNITION;
D O I
10.1007/s00034-023-02489-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As a combination of sound event detection and direction of arrival, the joint task of sound event localization and detection (SELD) is an emerging audio signal processing task and is applied in many areas widely. A popular convolutional recurrent neural network (CRNN)-based method uses convolution neural network (CNN) to extract high-level space features from manually designed features and utilizes recurrent neural network to model sequence context information. Some studies have shown that the normal CNN could not be robust in challenging acoustic environments such as overlapping, moving and discontinuous sources. To improve the performance of SELD in more complex acoustic scenes, parallel multi-attention enhancement (PMAE) is proposed as a convolution enhancement method to boost the representation ability of CNN in this paper. PMAE consists of attention feature enhancement (AFE) and parallel multi-attention (PMA) block. PMA, embedded into AFE, extracts boosting global-local features by efficient attention modules along with different dimensions. AFE, as a feature fusion strategy, fuses multi-scale enhanced features to improve feature representation. AFE shows great performance for overlapping sources. PMA adequately extracts characteristic information of different sound events and shows better performance on moving and discontinuous sources when it is combined with AFE. Based on such a framework, the SELD system becomes robust, while the target sources are moving and overlapping with unknown interference classes. The simulations show that proposed PMAE improves the performance enormously for SELD without other technologies, such as data augment and post-processing.
引用
下载
收藏
页码:545 / 567
页数:23
相关论文
共 50 条
  • [1] Sound Event Localization and Detection Using Parallel Multi-attention Enhancement
    Zhengyu Chen
    Qinghua Huang
    Circuits, Systems, and Signal Processing, 2024, 43 (1) : 545 - 567
  • [2] Sound Event Localization and Detection Based on Dual Attention
    Xu, Chundong
    Liu, Hao
    Min, Yuan
    Zhen, Yadi
    Computer Engineering and Applications, 2023, 59 (19) : 99 - 105
  • [3] Lightweight underwater object detection based on image enhancement and multi-attention
    Tian, Tian
    Cheng, Jixiang
    Wu, Dan
    Li, Zhidan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 63075 - 63093
  • [4] MULTI-ATTENTION NETWORK FOR THORACIC DISEASE CLASSIFICATION AND LOCALIZATION
    Ma, Yanbo
    Zhou, Qiuhao
    Chen, Xuesong
    Lu, Haihua
    Zhao, Yong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1378 - 1382
  • [5] Polyphonic sound event localization and detection based on Multiple Attention Fusion ResNet
    Zhang S.
    Zhang Y.
    Liao Y.
    Pang K.
    Wan Z.
    Zhou S.
    Mathematical Biosciences and Engineering, 2024, 21 (02) : 2004 - 2023
  • [6] Multi-attention embedded network for salient object detection
    Wei He
    Chen Pan
    Wenlong Xu
    Ning Zhang
    Soft Computing, 2021, 25 : 13053 - 13067
  • [7] Event Specific Attention for Polyphonic Sound Event Detection
    Sundar, Harshavardhan
    Sun, Ming
    Wang, Chao
    INTERSPEECH 2021, 2021, : 566 - 570
  • [8] Multi-Attention Network for Sewage Treatment Plant Detection
    Shuai, Yue
    Xie, Jun
    Lu, Kaixuan
    Chen, Zhengchao
    SUSTAINABILITY, 2023, 15 (07)
  • [9] Multi-attention embedded network for salient object detection
    He, Wei
    Pan, Chen
    Xu, Wenlong
    Zhang, Ning
    SOFT COMPUTING, 2021, 25 (20) : 13053 - 13067
  • [10] MultiANet: a Multi-Attention Network for Defocus Blur Detection
    Jiang, Zeyu
    Xu, Xun
    Zhang, Chao
    Zhu, Ce
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,