EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images

被引:1
|
作者
Xie, Bochen [1 ]
Deng, Yongjian [2 ]
Shao, Zhanpeng [3 ]
Li, Youfu [1 ]
机构
[1] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[3] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Cameras; Standards; Visualization; Task analysis; Semantics; Noise measurement; Event camera; multi-modal fusion; attention mechanism; semantic segmentation; VISION;
D O I
10.1109/TMM.2024.3380255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bio-inspired event cameras record a scene as sparse and asynchronous "events" by detecting per-pixel brightness changes. Such cameras show great potential in challenging scene understanding tasks, benefiting from the imaging advantages of high dynamic range and high temporal resolution. Considering the complementarity between event and standard cameras, we propose a multi-modal fusion network (EISNet) to improve the semantic segmentation performance. The key challenges of this topic lie in (i) how to encode event data to represent accurate scene information and (ii) how to fuse multi-modal complementary features by considering the characteristics of two modalities. To solve the first challenge, we propose an Activity-Aware Event Integration Module (AEIM) to convert event data into frame-based representations with high-confidence details via scene activity modeling. To tackle the second challenge, we introduce the Modality Recalibration and Fusion Module (MRFM) to recalibrate modal-specific representations and then aggregate multi-modal features at multiple stages. MRFM learns to generate modal-oriented masks to guide the merging of complementary features, achieving adaptive fusion. Based on these two core designs, our proposed EISNet adopts an encoder-decoder transformer architecture for accurate semantic segmentation using events and images. Experimental results show that our model outperforms state-of-the-art methods by a large margin on event-based semantic segmentation datasets.
引用
收藏
页码:8639 / 8650
页数:12
相关论文
共 50 条
  • [41] Recognition of multi-modal fusion images with irregular interference
    Wang, Yawei
    Chen, Yifei
    Wang, Dongfeng
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [42] Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation
    Zhang, Pan
    Chen, Ming
    Gao, Meng
    SENSORS, 2024, 24 (08)
  • [43] Based on Multi-Feature Information Attention Fusion for Multi-Modal Remote Sensing Image Semantic Segmentation
    Zhang, Chongyu
    2021 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2021), 2021, : 71 - 76
  • [44] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [45] Multi-modal Fusion
    Liu, Huaping
    Hussain, Amir
    Wang, Shuliang
    INFORMATION SCIENCES, 2018, 432 : 462 - 462
  • [46] Adherent Peanut Image Segmentation Based on Multi-Modal Fusion
    Wang, Yujing
    Ye, Fang
    Zeng, Jiusun
    Cai, Jinhui
    Huang, Wangsen
    SENSORS, 2024, 24 (14)
  • [47] Registration of multi-modal images under a complex background combining multiscale features extraction and semantic segmentation
    Jiang, Wenjun
    Wu, Ji
    Chen, Chi
    Chen, Jianming
    Zeng, Xiang Jin
    Zhong, Liyun
    DI, Jianglei
    Wu, Xiaoyan
    Qin, Yuwen
    OPTICS EXPRESS, 2022, 30 (20) : 35596 - 35607
  • [48] SPFUSIONNET: SKETCH SEGMENTATION USING MULTI-MODAL DATA FUSION
    Wang, Fei
    Lin, Shujin
    Wu, Hefeng
    Li, Hanhui
    Wang, Ruomei
    Luo, Xiaonan
    He, Xiangjian
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1654 - 1659
  • [49] Wavelet based texture segmentation of multi-modal tomographic images
    Busch, C
    COMPUTERS & GRAPHICS-UK, 1997, 21 (03): : 347 - 358
  • [50] Multi-modal Action Segmentation in the Kitchen with a Feature Fusion Approach
    Kogure, Shunsuke
    Aoki, Yoshimitsu
    FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794