Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

被引:0
|
作者
Li, Siqi [1 ]
Zou, Changqing [2 ]
Li, Yipeng [3 ]
Zhao, Xibin [1 ]
Gao, Yue [1 ]
机构
[1] Tsinghua Univ, Sch Software, KLISS, BNRist, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
引用
收藏
页码:11402 / 11409
页数:8
相关论文
共 50 条
  • [21] Attention-based multi-modal fusion for improved real estate appraisal: a case study in Los Angeles
    Bin, Junchi
    Gardiner, Bryan
    Liu, Zheng
    Li, Eric
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31163 - 31184
  • [22] High-Resolution Depth Maps Imaging via Attention-Based Hierarchical Multi-Modal Fusion
    Zhong, Zhiwei
    Liu, Xianming
    Jiang, Junjun
    Zhao, Debin
    Chen, Zhiwen
    Ji, Xiangyang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 648 - 663
  • [23] Fuel consumption prediction for pre-departure flights using attention-based multi-modal fusion
    Lin, Yi
    Guo, Dongyue
    Wu, Yuankai
    Li, Lishuai
    Wu, Edmond Q.
    Ge, Wenyi
    INFORMATION FUSION, 2024, 101
  • [24] EISNet: A Multi-Modal Fusion Network for Semantic Segmentation With Events and Images
    Xie, Bochen
    Deng, Yongjian
    Shao, Zhanpeng
    Li, Youfu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8639 - 8650
  • [25] AGREE: Attention-Based Tour Group Recommendation with Multi-modal Data
    Hu, Fang
    Huang, Xiuqi
    Gao, Xiaofeng
    Chen, Guihai
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 314 - 318
  • [26] Multi-Modal Fusion Sign Language Recognition Based on Residual Network and Attention Mechanism
    Chu Chaoqin
    Xiao Qinkun
    Zhang Yinhuan
    Xing, Liu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
  • [27] A co-attention based multi-modal fusion network for review helpfulness prediction
    Ren, Gang
    Diao, Lei
    Guo, Fanjia
    Hong, Taeho
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [28] Based on Multi-Feature Information Attention Fusion for Multi-Modal Remote Sensing Image Semantic Segmentation
    Zhang, Chongyu
    2021 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2021), 2021, : 71 - 76
  • [29] A Tri-Attention fusion guided multi-modal segmentation network
    Zhou, Tongxue
    Ruan, Su
    Vera, Pierre
    Canu, Stephane
    PATTERN RECOGNITION, 2022, 124
  • [30] Multi-modal Perception Fusion Method Based on Cross Attention
    Zhang B.-L.
    Pan Z.-H.
    Jiang J.-Z.
    Zhang C.-B.
    Wang Y.-X.
    Yang C.-L.
    Zhongguo Gonglu Xuebao/China Journal of Highway and Transport, 2024, 37 (03): : 181 - 193