CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion

被引:0
|
作者
Dong, Haotian [1 ]
Ma, Enhui [1 ]
Wang, Lubo [1 ]
Wang, Miaohui [2 ]
Xie, Wuyuan [2 ]
Guo, Qing [3 ]
Li, Ping [4 ,5 ]
Liang, Lingyu [6 ]
Yang, Kairui [7 ]
Lin, Di [1 ]
机构
[1] Tianjin Univ, Tianjin, Peoples R China
[2] Shenzhen Univ, Shenzhen, Peoples R China
[3] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[4] Agcy Sci Technol & Res, IHPC, Singapore, Singapore
[5] Agcy Sci Technol & Res, CFAR, Singapore, Singapore
[6] South China Univ Technol, Pazhou Lab, Guangzhou, Peoples R China
[7] Alibaba Damo Acad, Hangzhou, Peoples R China
基金
新加坡国家研究基金会;
关键词
FUSION NETWORK;
D O I
10.1109/ICCV51070.2023.00815
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic scene completion (SSC) requires an accurate understanding of the geometric and semantic relationships between the objects in the 3D scene for reasoning the occluded objects. The popular SSC methods voxelize the 3D objects, allowing the deep 3D convolutional network (3D CNN) to learn the object relationships from the complex scenes. However, the current networks lack the controllable kernels to model the object relationship across multiple views, where appropriate views provide the relevant information for suggesting the existence of the occluded objects. In this paper, we propose Cross-View Synthesis Transformer (CVSformer), which consists of Multi-View Feature Synthesis and Cross-View Transformer for learning cross-view object relationships. In the multi-view feature synthesis, we use a set of 3D convolutional kernels rotated differently to compute the multi-view features for each voxel. In the cross-view transformer, we employ the cross-view fusion to comprehensively learn the cross-view relationships, which form useful information for enhancing the features of individual views. We use the enhanced features to predict the geometric occupancies and semantic labels of all voxels. We evaluate CVSformer on public datasets, where CVSformer yields state-of-the-art results. Our code is available at https://github.com/donghaotian123/CVSformer.
引用
收藏
页码:8840 / 8849
页数:10
相关论文
共 50 条
  • [41] Semantic Point Completion Network for 3D Semantic Scene Completion
    Zhong, Min
    Zeng, Gang
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2824 - 2831
  • [42] Cross-view panorama image synthesis with progressive attention GANs
    Wu, Songsong
    Tang, Hao
    Jing, Xiao-Yuan
    Qian, Jianjun
    Sebe, Nicu
    Yan, Yan
    Zhang, Qinghua
    [J]. PATTERN RECOGNITION, 2022, 131
  • [43] View-Volume Network for Semantic Scene Completion from a Single Depth Image
    Guo, Yuxiao
    Tong, Xin
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 726 - 732
  • [44] CVIformer: Cross-View Interactive Transformer for Efficient Stereoscopic Image Super-Resolution
    Zhang, Dongyang
    Liang, Shuang
    He, Tao
    Shao, Jie
    Qin, Ke
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [45] MRFTrans: Multimodal Representation Fusion Transformer for monocular 3D semantic scene completion
    Xu, Rongtao
    Zhang, Jiguang
    Sun, Jiaxi
    Wang, Changwei
    Wu, Yifan
    Xu, Shibiao
    Meng, Weiliang
    Zhang, Xiaopeng
    [J]. INFORMATION FUSION, 2024, 111
  • [46] SimH: A Supervised Cross-View Hashing Framework Preserving Semantic Similarities in Hamming Space
    Xia Shijun
    Gu Zhongyuan
    Ge Shengbin
    Hu Weijin
    [J]. 8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 217 - 222
  • [47] CVLNet: Cross-view Semantic Correspondence Learning for Video-Based Camera Localization
    Shi, Yujiao
    Yu, Xin
    Wang, Shan
    Li, Hongdong
    [J]. COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 123 - 141
  • [48] Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
    Wang, Zicheng
    Zhao, Zhen
    Xing, Xiaoxia
    Xu, Dong
    Kong, Xiangyu
    Zhou, Luping
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19585 - 19595
  • [49] SEMI-SUPERVISED CROSS-VIEW SCENE MODEL ADAPTATION FOR REMOTE SENSING IMAGE CLASSIFICATION
    Deng, Zhipeng
    Sun, Hao
    Zhou, Shilin
    Ji, Kefeng
    [J]. 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 2376 - 2379
  • [50] From Satellite to Ground: Satellite Assisted Visual Localization with Cross-view Semantic Matching
    [J]. Zhang, Guofeng (zhangguofeng@zju.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.