Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

被引:0
|
作者
Li, Siqi [1 ]
Zou, Changqing [2 ]
Li, Yipeng [3 ]
Zhao, Xibin [1 ]
Gao, Yue [1 ]
机构
[1] Tsinghua Univ, Sch Software, KLISS, BNRist, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
引用
收藏
页码:11402 / 11409
页数:8
相关论文
共 50 条
  • [1] Multi-modal fusion architecture search for camera-based semantic scene completion
    Wang, Xuzhi
    Feng, Wei
    Wan, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [2] Attention-based multi-modal fusion sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
  • [3] ARF-Net: a multi-modal aesthetic attention-based fusion
    Iffath, Fariha
    Gavrilova, Marina
    [J]. VISUAL COMPUTER, 2024, 40 (07): : 4941 - 4953
  • [4] AMM-FuseNet: Attention-Based Multi-Modal Image Fusion Network for Land Cover Mapping
    Ma, Wanli
    Karaku, Oktay
    Rosin, Paul L.
    [J]. REMOTE SENSING, 2022, 14 (18)
  • [5] AGGN: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion
    Wu, Peishu
    Wang, Zidong
    Zheng, Baixun
    Li, Han
    Alsaadi, Fuad E.
    Zeng, Nianyin
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
  • [6] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
    Liu, Yunlong
    Yoshie, Osamu
    Watanabe, Hiroshi
    [J]. COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
  • [7] Attention-based convolutional neural network with multi-modal temporal information fusion for motor imagery EEG decoding
    Ma X.
    Chen W.
    Pei Z.
    Zhang Y.
    Chen J.
    [J]. Computers in Biology and Medicine, 2024, 175
  • [8] An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis
    Zheng, Guowei
    Zheng, Weihao
    Zhang, Yu
    Wang, Junyu
    Chen, Miao
    Wang, Yin
    Cai, Tianhong
    Yao, Zhijun
    Hu, Bin
    [J]. JOURNAL OF NEURAL ENGINEERING, 2023, 20 (06)
  • [9] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
    Chen, Jianrong
    Dey, Sujit
    Wang, Lei
    Bi, Ning
    Liu, Peng
    [J]. IEEE Access, 2024, 12 : 137203 - 137221
  • [10] DFAMNet: dual fusion attention multi-modal network for semantic segmentation on LiDAR point clouds
    Mingjie Li
    Gaihua Wang
    Minghao Zhu
    Chunzheng Li
    Hong Liu
    Xuran Pan
    Qian Long
    [J]. Applied Intelligence, 2024, 54 : 3169 - 3180