Sound Event Localization and Detection Using Parallel Multi-attention Enhancement

被引:1
|
作者
Chen, Zhengyu [1 ]
Huang, Qinghua [1 ]
机构
[1] Shanghai Univ, Sch Commun & Informat Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Sound event localization and detection; Parallel multi-attention; Global information; Feature fusion; DEEP NEURAL-NETWORKS; RECOGNITION;
D O I
10.1007/s00034-023-02489-x
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As a combination of sound event detection and direction of arrival, the joint task of sound event localization and detection (SELD) is an emerging audio signal processing task and is applied in many areas widely. A popular convolutional recurrent neural network (CRNN)-based method uses convolution neural network (CNN) to extract high-level space features from manually designed features and utilizes recurrent neural network to model sequence context information. Some studies have shown that the normal CNN could not be robust in challenging acoustic environments such as overlapping, moving and discontinuous sources. To improve the performance of SELD in more complex acoustic scenes, parallel multi-attention enhancement (PMAE) is proposed as a convolution enhancement method to boost the representation ability of CNN in this paper. PMAE consists of attention feature enhancement (AFE) and parallel multi-attention (PMA) block. PMA, embedded into AFE, extracts boosting global-local features by efficient attention modules along with different dimensions. AFE, as a feature fusion strategy, fuses multi-scale enhanced features to improve feature representation. AFE shows great performance for overlapping sources. PMA adequately extracts characteristic information of different sound events and shows better performance on moving and discontinuous sources when it is combined with AFE. Based on such a framework, the SELD system becomes robust, while the target sources are moving and overlapping with unknown interference classes. The simulations show that proposed PMAE improves the performance enormously for SELD without other technologies, such as data augment and post-processing.
引用
下载
收藏
页码:545 / 567
页数:23
相关论文
共 50 条
  • [31] A Multi-Attention Fusion Mechanism for Collaborative Industrial Surface Defect Detection
    Yue, Xiaoli
    Zhong, Guoqiang
    Chu, Boce
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [32] Multi-Attention Pyramid Context Network for Infrared Small Ship Detection
    Guo, Feng
    Ma, Hongbing
    Li, Liangliang
    Lv, Ming
    Jia, Zhenhong
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (02)
  • [33] Polyphonic sound event localization and detection using channel-wise FusionNet
    Spoorthy, V.
    Kooolagudi, Shashidhar G.
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5015 - 5026
  • [34] Noise Robust Sound Event Detection Using Deep Learning and Audio Enhancement
    Wan, Tongtang
    Zhou, Yi
    Ma, Yongbao
    Liu, Hongqing
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [35] Parallel Capsule Neural Networks for Sound Event Detection
    Liang, Kai-Wen
    Tseng, Yu-Hao
    Chang, Pao-Chi
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1933 - 1936
  • [36] Sound event localization and detection based on deep learning
    ZHAO Dada
    DING Kai
    QI Xiaogang
    CHEN Yu
    FENG Hailin
    Journal of Systems Engineering and Electronics, 2024, 35 (02) : 294 - 301
  • [37] Efficient Sound Event Localization and Detection in the Quaternion Domain
    Brignone, Christian
    Mancini, Gioia
    Grassucci, Eleonora
    Uncini, Aurelio
    Comminiello, Danilo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (05) : 2453 - 2457
  • [38] A Model Ensemble Approach for Sound Event Localization and Detection
    Wang, Qing
    Wu, Huaxin
    Jing, Zijun
    Ma, Feng
    Fang, Yi
    Wang, Yuxuan
    Chen, Tairan
    Pan, Jia
    Du, Jun
    Lee, Chin-Hui
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [39] Sound event localization and detection based on deep learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (02) : 294 - 301
  • [40] Sound Event Localization and Detection Based on Deep Learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    Journal of Systems Engineering and Electronics, 2024, 35 (02) : 294 - 301