Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

被引:0
|
作者
Li, Siqi [1 ]
Zou, Changqing [2 ]
Li, Yipeng [3 ]
Zhao, Xibin [1 ]
Gao, Yue [1 ]
机构
[1] Tsinghua Univ, Sch Software, KLISS, BNRist, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
引用
收藏
页码:11402 / 11409
页数:8
相关论文
共 50 条
  • [1] Multi-modal fusion architecture search for camera-based semantic scene completion
    Wang, Xuzhi
    Feng, Wei
    Wan, Liang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [2] Attention-based multi-modal fusion sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
  • [3] ARF-Net: a multi-modal aesthetic attention-based fusion
    Iffath, Fariha
    Gavrilova, Marina
    VISUAL COMPUTER, 2024, 40 (07): : 4941 - 4953
  • [4] Attention-based Fusion Network for Breast Cancer Segmentation and Classification Using Multi-modal Ultrasound Images
    Cho, Yoonjae
    Misra, Sampa
    Managuli, Ravi
    Barr, Richard G.
    Lee, Jeongmin
    Kim, Chulhong
    ULTRASOUND IN MEDICINE AND BIOLOGY, 2025, 51 (03): : 568 - 577
  • [5] AMM-FuseNet: Attention-Based Multi-Modal Image Fusion Network for Land Cover Mapping
    Ma, Wanli
    Karaku, Oktay
    Rosin, Paul L.
    REMOTE SENSING, 2022, 14 (18)
  • [6] AGGN: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion
    Wu, Peishu
    Wang, Zidong
    Zheng, Baixun
    Li, Han
    Alsaadi, Fuad E.
    Zeng, Nianyin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
  • [7] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
    Liu, Yunlong
    Yoshie, Osamu
    Watanabe, Hiroshi
    COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
  • [8] Attention-based convolutional neural network with multi-modal temporal information fusion for motor imagery EEG decoding
    Ma X.
    Chen W.
    Pei Z.
    Zhang Y.
    Chen J.
    Computers in Biology and Medicine, 2024, 175
  • [9] An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis
    Zheng, Guowei
    Zheng, Weihao
    Zhang, Yu
    Wang, Junyu
    Chen, Miao
    Wang, Yin
    Cai, Tianhong
    Yao, Zhijun
    Hu, Bin
    JOURNAL OF NEURAL ENGINEERING, 2023, 20 (06)
  • [10] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
    Chen, Jianrong
    Dey, Sujit
    Wang, Lei
    Bi, Ning
    Liu, Peng
    IEEE ACCESS, 2024, 12 : 137203 - 137221