EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images

被引:1
|
作者
Xie, Bochen [1 ]
Deng, Yongjian [2 ]
Shao, Zhanpeng [3 ]
Li, Youfu [1 ]
机构
[1] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[3] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Cameras; Standards; Visualization; Task analysis; Semantics; Noise measurement; Event camera; multi-modal fusion; attention mechanism; semantic segmentation; VISION;
D O I
10.1109/TMM.2024.3380255
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bio-inspired event cameras record a scene as sparse and asynchronous "events" by detecting per-pixel brightness changes. Such cameras show great potential in challenging scene understanding tasks, benefiting from the imaging advantages of high dynamic range and high temporal resolution. Considering the complementarity between event and standard cameras, we propose a multi-modal fusion network (EISNet) to improve the semantic segmentation performance. The key challenges of this topic lie in (i) how to encode event data to represent accurate scene information and (ii) how to fuse multi-modal complementary features by considering the characteristics of two modalities. To solve the first challenge, we propose an Activity-Aware Event Integration Module (AEIM) to convert event data into frame-based representations with high-confidence details via scene activity modeling. To tackle the second challenge, we introduce the Modality Recalibration and Fusion Module (MRFM) to recalibrate modal-specific representations and then aggregate multi-modal features at multiple stages. MRFM learns to generate modal-oriented masks to guide the merging of complementary features, achieving adaptive fusion. Based on these two core designs, our proposed EISNet adopts an encoder-decoder transformer architecture for accurate semantic segmentation using events and images. Experimental results show that our model outperforms state-of-the-art methods by a large margin on event-based semantic segmentation datasets.
引用
收藏
页码:8639 / 8650
页数:12
相关论文
共 50 条
  • [21] Pseudo Multi-Modal Approach to LiDAR Semantic Segmentation
    Kim, Kyungmin
    SENSORS, 2024, 24 (23)
  • [22] MULTI-MODAL SEMANTIC MESH SEGMENTATION IN URBAN SCENES
    Laupheimer, Dominik
    Haala, Norbert
    XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION II, 2022, 5-2 : 267 - 274
  • [23] Attention-based Fusion Network for Breast Cancer Segmentation and Classification Using Multi-modal Ultrasound Images
    Cho, Yoonjae
    Misra, Sampa
    Managuli, Ravi
    Barr, Richard G.
    Lee, Jeongmin
    Kim, Chulhong
    ULTRASOUND IN MEDICINE AND BIOLOGY, 2025, 51 (03): : 568 - 577
  • [24] Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network
    Liu, Xiaozheng
    Zhang, Yunzhou
    Cao, He
    Shan, Dexing
    Zhao, Jiaqi
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1751 - 1756
  • [25] A multi-modal system for the retrieval of semantic video events
    Amir, A
    Basu, S
    Iyengar, G
    Lin, CY
    Naphade, M
    Smith, JR
    Srinivasan, S
    Tseng, B
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
  • [26] A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery
    Liu, Yutong
    Gao, Kun
    Wang, Hong
    Yang, Zhijia
    Wang, Pengyu
    Ji, Shijing
    Huang, Yanjun
    Zhu, Zhenyu
    Zhao, Xiaobin
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 133
  • [27] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
    Lyu, Y.
    Schiopu, I.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
  • [28] Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images
    Mengyu WANG
    Zhiyuan YAN
    Yingchao FENG
    Wenhui DIAO
    Xian SUN
    Journal of Geodesy and Geoinformation Science, 2023, 6 (04) : 27 - 39
  • [29] A framework for unsupervised segmentation of multi-modal medical images
    El-Baz, Ayman
    Farag, Aly
    Ali, Asem
    Gimel'farb, Georgy
    Casanova, Manuel
    COMPUTER VISION APPROACHES TO MEDICAL IMAGE ANALYSIS, 2006, 4241 : 120 - 131
  • [30] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Xin Lan
    Xiaojing Gu
    Xingsheng Gu
    Applied Intelligence, 2022, 52 : 5817 - 5829