Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

被引：0

作者：

Li, Siqi ^{[1
]}

Zou, Changqing ^{[2
]}

Li, Yipeng ^{[3
]}

Zhao, Xibin ^{[1
]}

Gao, Yue ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Software, KLISS, BNRist, Beijing, Peoples R China

[2] Huawei Noahs Ark Lab, Beijing, Peoples R China

[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

引用

页码：11402 / 11409

页数：8

共 50 条

[1] Multi-modal fusion architecture search for camera-based semantic scene completion
Wang, Xuzhi
Feng, Wei
Wan, Liang
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
[2] Attention-based multi-modal fusion sarcasm detection
Liu, Jing
Tian, Shengwei
Yu, Long
Long, Jun
Zhou, Tiejun
Wang, Bo
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
[3] ARF-Net: a multi-modal aesthetic attention-based fusion
Iffath, Fariha
Gavrilova, Marina
VISUAL COMPUTER, 2024, 40 (07): : 4941 - 4953
[4] Attention-based Fusion Network for Breast Cancer Segmentation and Classification Using Multi-modal Ultrasound Images
Cho, Yoonjae
Misra, Sampa
Managuli, Ravi
Barr, Richard G.
Lee, Jeongmin
Kim, Chulhong
ULTRASOUND IN MEDICINE AND BIOLOGY, 2025, 51 (03): : 568 - 577
[5] AMM-FuseNet: Attention-Based Multi-Modal Image Fusion Network for Land Cover Mapping
Ma, Wanli
Karaku, Oktay
Rosin, Paul L.
REMOTE SENSING, 2022, 14 (18)
[6] AGGN: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion
Wu, Peishu
Wang, Zidong
Zheng, Baixun
Li, Han
Alsaadi, Fuad E.
Zeng, Nianyin
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
[7] Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
Liu, Yunlong
Yoshie, Osamu
Watanabe, Hiroshi
COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 378 - 397
[8] Attention-based convolutional neural network with multi-modal temporal information fusion for motor imagery EEG decoding
Ma X.
Chen W.
Pei Z.
Zhang Y.
Chen J.
Computers in Biology and Medicine, 2024, 175
[9] An attention-based multi-modal MRI fusion model for major depressive disorder diagnosis
Zheng, Guowei
Zheng, Weihao
Zhang, Yu
Wang, Junyu
Chen, Miao
Wang, Yin
Cai, Tianhong
Yao, Zhijun
Hu, Bin
JOURNAL OF NEURAL ENGINEERING, 2023, 20 (06)
[10] Attention-Based Multi-Modal Multi-View Fusion Approach for Driver Facial Expression Recognition
Chen, Jianrong
Dey, Sujit
Wang, Lei
Bi, Ning
Liu, Peng
IEEE ACCESS, 2024, 12 : 137203 - 137221

← 1 2 3 4 5 →