EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images

被引：1

作者：

Xie, Bochen ^{[1
]}

Deng, Yongjian ^{[2
]}

Shao, Zhanpeng ^{[3
]}

Li, Youfu ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China

[2] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China

[3] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Semantic segmentation; Cameras; Standards; Visualization; Task analysis; Semantics; Noise measurement; Event camera; multi-modal fusion; attention mechanism; semantic segmentation; VISION;

D O I：

10.1109/TMM.2024.3380255

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bio-inspired event cameras record a scene as sparse and asynchronous "events" by detecting per-pixel brightness changes. Such cameras show great potential in challenging scene understanding tasks, benefiting from the imaging advantages of high dynamic range and high temporal resolution. Considering the complementarity between event and standard cameras, we propose a multi-modal fusion network (EISNet) to improve the semantic segmentation performance. The key challenges of this topic lie in (i) how to encode event data to represent accurate scene information and (ii) how to fuse multi-modal complementary features by considering the characteristics of two modalities. To solve the first challenge, we propose an Activity-Aware Event Integration Module (AEIM) to convert event data into frame-based representations with high-confidence details via scene activity modeling. To tackle the second challenge, we introduce the Modality Recalibration and Fusion Module (MRFM) to recalibrate modal-specific representations and then aggregate multi-modal features at multiple stages. MRFM learns to generate modal-oriented masks to guide the merging of complementary features, achieving adaptive fusion. Based on these two core designs, our proposed EISNet adopts an encoder-decoder transformer architecture for accurate semantic segmentation using events and images. Experimental results show that our model outperforms state-of-the-art methods by a large margin on event-based semantic segmentation datasets.

引用

页码：8639 / 8650

页数：12

共 50 条

[21] Pseudo Multi-Modal Approach to LiDAR Semantic Segmentation
Kim, Kyungmin
SENSORS, 2024, 24 (23)
[22] MULTI-MODAL SEMANTIC MESH SEGMENTATION IN URBAN SCENES
Laupheimer, Dominik
Haala, Norbert
XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION II, 2022, 5-2 : 267 - 274
[23] Attention-based Fusion Network for Breast Cancer Segmentation and Classification Using Multi-modal Ultrasound Images
Cho, Yoonjae
Misra, Sampa
Managuli, Ravi
Barr, Richard G.
Lee, Jeongmin
Kim, Chulhong
ULTRASOUND IN MEDICINE AND BIOLOGY, 2025, 51 (03): : 568 - 577
[24] Joint Segmentation and Grasp Pose Detection with Multi-Modal Feature Fusion Network
Liu, Xiaozheng
Zhang, Yunzhou
Cao, He
Shan, Dexing
Zhao, Jiaqi
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 1751 - 1756
[25] A multi-modal system for the retrieval of semantic video events
Amir, A
Basu, S
Iyengar, G
Lin, CY
Naphade, M
Smith, JR
Srinivasan, S
Tseng, B
COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
[26] A Transformer-based multi-modal fusion network for semantic segmentation of high-resolution remote sensing imagery
Liu, Yutong
Gao, Kun
Wang, Hong
Yang, Zhijia
Wang, Pengyu
Ji, Shijing
Huang, Yanjun
Zhu, Zhenyu
Zhao, Xiaobin
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 133
[27] Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation
Lyu, Y.
Schiopu, I.
Munteanu, A.
ELECTRONICS LETTERS, 2020, 56 (18) : 920 - 922
[28] Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images
Mengyu WANG
Zhiyuan YAN
Yingchao FENG
Wenhui DIAO
Xian SUN
Journal of Geodesy and Geoinformation Science, 2023, 6 (04) : 27 - 39
[29] A framework for unsupervised segmentation of multi-modal medical images
El-Baz, Ayman
Farag, Aly
Ali, Asem
Gimel'farb, Georgy
Casanova, Manuel
COMPUTER VISION APPROACHES TO MEDICAL IMAGE ANALYSIS, 2006, 4241 : 120 - 131
[30] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
Xin Lan
Xiaojing Gu
Xingsheng Gu
Applied Intelligence, 2022, 52 : 5817 - 5829

← 1 2 3 4 5 →