Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

被引:0
|
作者
Li, Siqi [1 ]
Zou, Changqing [2 ]
Li, Yipeng [3 ]
Zhao, Xibin [1 ]
Gao, Yue [1 ]
机构
[1] Tsinghua Univ, Sch Software, KLISS, BNRist, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
引用
收藏
页码:11402 / 11409
页数:8
相关论文
共 50 条
  • [31] From Front to Rear: 3D Semantic Scene Completion Through Planar Convolution and Attention-Based Network
    Li, Jie
    Song, Qi
    Yan, Xiaohu
    Chen, Yongquan
    Huang, Rui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8294 - 8307
  • [32] FFNet: Frequency Fusion Network for Semantic Scene Completion
    Wang, Xuzhi
    Lin, Di
    Wan, Liang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2550 - 2557
  • [33] Latent Edge Guided Depth Super-Resolution Using Attention-Based Hierarchical Multi-Modal Fusion
    Lan, Hui
    Jung, Cheolkon
    IEEE ACCESS, 2024, 12 : 114512 - 114526
  • [34] Scene Text Detection via Deep Semantic Feature Fusion and Attention-based Refinement
    Song, Yu
    Cui, Yuanshun
    Han, Hu
    Shan, Shiguang
    Chen, Xilin
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3747 - 3752
  • [35] AutoAMS: Automated attention-based multi-modal graph learning architecture search
    Al-Sabri, Raeed
    Gao, Jianliang
    Chen, Jiamin
    Oloulade, Babatounde Moctard
    Wu, Zhenpeng
    NEURAL NETWORKS, 2024, 179
  • [36] A Probabilistic Approach for Attention-Based Multi-Modal Human-Robot Interaction
    Begum, Momotaz
    Karray, Fakhri
    Mann, George K. I.
    Gosine, Raymond
    RO-MAN 2009: THE 18TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2009, : 909 - +
  • [37] Semantic attention-based heterogeneous feature aggregation network for image fusion
    Ruan, Zhiqiang
    Wan, Jie
    Xiao, Guobao
    Tang, Zhimin
    Ma, Jiayi
    PATTERN RECOGNITION, 2024, 155
  • [38] Attention-based fusion network for RGB-D semantic segmentation
    Zhong, Li
    Guo, Chi
    Zhan, Jiao
    Deng, JingYi
    NEUROCOMPUTING, 2024, 608
  • [39] Attention-Based Multi-Stage Network for Point Cloud Completion
    Yin Xiyang
    Zhou Pei
    Zhu Jiangping
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (10)
  • [40] Representation and Fusion Based on Knowledge Graph in Multi-Modal Semantic Communication
    Xing, Chenlin
    Lv, Jie
    Luo, Tao
    Zhang, Zhilong
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (05) : 1344 - 1348