EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images

被引:1
|
作者
Xie, Bochen [1 ]
Deng, Yongjian [2 ]
Shao, Zhanpeng [3 ]
Li, Youfu [1 ]
机构
[1] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[3] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Cameras; Standards; Visualization; Task analysis; Semantics; Noise measurement; Event camera; multi-modal fusion; attention mechanism; semantic segmentation; VISION;
D O I
10.1109/TMM.2024.3380255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bio-inspired event cameras record a scene as sparse and asynchronous "events" by detecting per-pixel brightness changes. Such cameras show great potential in challenging scene understanding tasks, benefiting from the imaging advantages of high dynamic range and high temporal resolution. Considering the complementarity between event and standard cameras, we propose a multi-modal fusion network (EISNet) to improve the semantic segmentation performance. The key challenges of this topic lie in (i) how to encode event data to represent accurate scene information and (ii) how to fuse multi-modal complementary features by considering the characteristics of two modalities. To solve the first challenge, we propose an Activity-Aware Event Integration Module (AEIM) to convert event data into frame-based representations with high-confidence details via scene activity modeling. To tackle the second challenge, we introduce the Modality Recalibration and Fusion Module (MRFM) to recalibrate modal-specific representations and then aggregate multi-modal features at multiple stages. MRFM learns to generate modal-oriented masks to guide the merging of complementary features, achieving adaptive fusion. Based on these two core designs, our proposed EISNet adopts an encoder-decoder transformer architecture for accurate semantic segmentation using events and images. Experimental results show that our model outperforms state-of-the-art methods by a large margin on event-based semantic segmentation datasets.
引用
收藏
页码:8639 / 8650
页数:12
相关论文
共 50 条
  • [1] MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
    Wang, Yan
    Cao, Li
    Deng, He
    SENSORS, 2024, 24 (22)
  • [2] MFTransNet: A Multi-Modal Fusion with CNN-Transformer Network for Semantic Segmentation of HSR Remote Sensing Images
    He, Shumeng
    Yang, Houqun
    Zhang, Xiaoying
    Li, Xuanyu
    MATHEMATICS, 2023, 11 (03)
  • [3] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
    Liu, Yunlong
    Yoshie, Osamu
    Watanabe, Hiroshi
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
  • [4] Semantic Segmentation of Defects in Infrastructures through Multi-modal Images
    Shahsavarani, Sara
    Lopez, Fernando
    Ibarra-Castanedo, Clemente
    Maldague, Xavier P., V
    THERMOSENSE: THERMAL INFRARED APPLICATIONS XLVI, 2024, 13047
  • [5] DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds
    Mingjie Li
    Gaihua Wang
    Minghao Zhu
    Chunzheng Li
    Hong Liu
    Xuran Pan
    Qian Long
    Applied Intelligence, 2024, 54 : 3169 - 3180
  • [6] Multi-modal semantic image segmentation
    Pemasiri, Akila
    Kien Nguyen
    Sridharan, Sridha
    Fookes, Clinton
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 202
  • [7] Multi-modal dataset and fusion network for simultaneous semantic segmentation of on-road dynamic objects
    Cho, Jieun
    Ha, Jinsu
    Song, Hamin
    Jang, Sungmoon
    Jo, Kichun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
  • [8] DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds
    Li, Mingjie
    Wang, Gaihua
    Zhu, Minghao
    Li, Chunzheng
    Liu, Hong
    Pan, Xuran
    Long, Qian
    APPLIED INTELLIGENCE, 2024, 54 (04) : 3169 - 3180
  • [9] TAG-fusion: Two-stage attention guided multi-modal fusion network for semantic segmentation
    Zhang, Zhizhou
    Wang, Wenwu
    Zhu, Lei
    Tang, Zhibin
    DIGITAL SIGNAL PROCESSING, 2025, 156
  • [10] A multi-modal and multi-stage fusion enhancement network for segmentation based on OCT and OCTA images
    Quan, Xiongwen
    Hou, Guangyao
    Yin, Wenya
    Zhang, Han
    INFORMATION FUSION, 2025, 113