Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

被引:0
|
作者
Li, Siqi [1 ]
Zou, Changqing [2 ]
Li, Yipeng [3 ]
Zhao, Xibin [1 ]
Gao, Yue [1 ]
机构
[1] Tsinghua Univ, Sch Software, KLISS, BNRist, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
引用
收藏
页码:11402 / 11409
页数:8
相关论文
共 50 条
  • [41] Multi-modal Scene Recognition Based on Global Self-attention Mechanism
    Li, Xiang
    Sun, Ning
    Liu, Jixin
    Chai, Lei
    Sun, Haian
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 109 - 121
  • [42] A Novel Multi-Modal Network-Based Dynamic Scene Understanding
    Uddin, Md Azher
    Joolee, Joolekha Bibi
    Lee, Young-Koo
    Sohn, Kyung-Ah
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
  • [43] Multi-modal Characteristic Guided Depth Completion Network
    Lee, Yongjin
    Park, Seokjun
    Kang, Beomgu
    Park, HyunWook
    COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 593 - 607
  • [44] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
    Lei, Han
    Chen, Ning
    INTERSPEECH 2022, 2022, : 4157 - 4161
  • [45] A Novel Attention-Based Early Fusion Multi-Modal CNN Approach to Identify Soil Erosion Based on Unmanned Aerial Vehicle
    Miao, Sheng
    Liu, Yufeng
    Liu, Zitong
    Shen, Xiang
    Liu, Chao
    Gao, Weijun
    IEEE ACCESS, 2024, 12 : 95152 - 95164
  • [46] MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
    Wang, Yan
    Cao, Li
    Deng, He
    SENSORS, 2024, 24 (22)
  • [47] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [48] Hyper-node Relational Graph Attention Network for Multi-modal Knowledge Graph Completion
    Liang, Shuang
    Zhu, Anjie
    Zhang, Jiasheng
    Shao, Jie
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [49] Mixture of Attention Variants for Modal Fusion in Multi-Modal Sentiment Analysis
    He, Chao
    Zhang, Xinghua
    Song, Dongqing
    Shen, Yingshan
    Mao, Chengjie
    Wen, Huosheng
    Zhu, Dingju
    Cai, Lihua
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (02)
  • [50] Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
    Huddar, Mahesh G.
    Sannakki, Sanjeev S.
    Rajpurohit, Vijay S.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 112 - 121