CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion

被引:0
|
作者
Dong, Haotian [1 ]
Ma, Enhui [1 ]
Wang, Lubo [1 ]
Wang, Miaohui [2 ]
Xie, Wuyuan [2 ]
Guo, Qing [3 ]
Li, Ping [4 ,5 ]
Liang, Lingyu [6 ]
Yang, Kairui [7 ]
Lin, Di [1 ]
机构
[1] Tianjin Univ, Tianjin, Peoples R China
[2] Shenzhen Univ, Shenzhen, Peoples R China
[3] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[4] Agcy Sci Technol & Res, IHPC, Singapore, Singapore
[5] Agcy Sci Technol & Res, CFAR, Singapore, Singapore
[6] South China Univ Technol, Pazhou Lab, Guangzhou, Peoples R China
[7] Alibaba Damo Acad, Hangzhou, Peoples R China
基金
新加坡国家研究基金会;
关键词
FUSION NETWORK;
D O I
10.1109/ICCV51070.2023.00815
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic scene completion (SSC) requires an accurate understanding of the geometric and semantic relationships between the objects in the 3D scene for reasoning the occluded objects. The popular SSC methods voxelize the 3D objects, allowing the deep 3D convolutional network (3D CNN) to learn the object relationships from the complex scenes. However, the current networks lack the controllable kernels to model the object relationship across multiple views, where appropriate views provide the relevant information for suggesting the existence of the occluded objects. In this paper, we propose Cross-View Synthesis Transformer (CVSformer), which consists of Multi-View Feature Synthesis and Cross-View Transformer for learning cross-view object relationships. In the multi-view feature synthesis, we use a set of 3D convolutional kernels rotated differently to compute the multi-view features for each voxel. In the cross-view transformer, we employ the cross-view fusion to comprehensively learn the cross-view relationships, which form useful information for enhancing the features of individual views. We use the enhanced features to predict the geometric occupancies and semantic labels of all voxels. We evaluate CVSformer on public datasets, where CVSformer yields state-of-the-art results. Our code is available at https://github.com/donghaotian123/CVSformer.
引用
收藏
页码:8840 / 8849
页数:10
相关论文
共 50 条
  • [1] View Synthesis with Scene Recognition for Cross-View Image Localization
    Lee, Uddom
    Jiang, Peng
    Wu, Hongyi
    Xin, Chunsheng
    [J]. FUTURE INTERNET, 2023, 15 (04):
  • [2] Semantic Cross-View Matching
    Castaldo, Francesco
    Zamir, Amir
    Angst, Roland
    Palmieri, Francesco
    Savarese, Silvio
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 1044 - 1052
  • [3] Cross-View Cross-Scene Multi-View Crowd Counting
    Zhang, Qi
    Lin, Wei
    Chan, Antoni B.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 557 - 567
  • [4] Matrix Completion for Cross-view Pairwise Constraint Propagation
    Yang, Zheng
    Hu, Yao
    Liu, Haifeng
    Chen, Huajun
    Wu, Zhaohui
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 897 - 900
  • [5] Cross-View Semantic Segmentation for Sensing Surroundings
    Pan, Bowen
    Sun, Jiankai
    Leung, Ho Yin Tiga
    Andonian, Alex
    Zhou, Bolei
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (03): : 4867 - 4873
  • [6] Semantic Scene Completion Through Context Transformer and Recurrent Convolution
    Yang, Wenlong
    Yu, Hongfei
    Cao, Yang
    [J]. IEEE ACCESS, 2024, 12 : 69700 - 69709
  • [7] Cross-view Semantic Alignment for Livestreaming Product Recognition
    Yang, Wenjie
    Chen, Yiyi
    Li, Yan
    Cheng, Yanhua
    Liu, Xudong
    Chen, Quan
    Li, Han
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13358 - 13367
  • [8] Cross-View Panorama Image Synthesis
    Wu, Songsong
    Tang, Hao
    Jing, Xiao-Yuan
    Zhao, Haifeng
    Qian, Jianjun
    Sebe, Nicu
    Yan, Yan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3546 - 3559
  • [9] Scene-Centric Joint Parsing of Cross-View Videos
    Qi, Hang
    Xu, Yuanlu
    Yuan, Tao
    Wu, Tianfu
    Zhu, Song-Chun
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7292 - 7299
  • [10] CompNVS: Novel View Synthesis with Scene Completion
    Li, Zuoyue
    Fan, Tianxing
    Li, Zhenqiang
    Cui, Zhaopeng
    Sato, Yoichi
    Pollefeys, Marc
    Oswald, Martin R.
    [J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 447 - 463