GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

被引:0
|
作者
Huang, Yuanhui [1 ]
Zheng, Wenzhao [1 ,2 ]
Zhang, Yunpeng [3 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] PhiGent Robot, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
3D Occupancy Prediction; 3D Gaussian splatting; Autonomous Driving; PRIORS;
D O I
10.1007/978-3-031-73383-3_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8%-24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.
引用
收藏
页码:376 / 393
页数:18
相关论文
共 50 条
  • [31] A Comprehensive Review of Vision-Based 3D Reconstruction Methods
    Zhou, Linglong
    Wu, Guoxin
    Zuo, Yunbo
    Chen, Xuanyu
    Hu, Hongle
    SENSORS, 2024, 24 (07)
  • [32] A parallel stereo vision-based 3D pneumatic arm
    Wang, Ying T.
    Wong, Ray-Hwa
    Liu, Chao-Yi
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2011, 33 (05) : 542 - 557
  • [33] Overview on Vision-Based 3D Object Recognition Methods
    Dong, Tianzhen
    Qi, Xiao
    Zhang, Qing
    Li, Wenju
    Xiong, Liang
    IMAGE AND GRAPHICS, ICIG 2019, PT II, 2019, 11902 : 243 - 254
  • [34] 3D Vision-based Security Monitoring for Railroad Stations
    Park, Youngtae
    Lee, Daeho
    JOURNAL OF THE OPTICAL SOCIETY OF KOREA, 2010, 14 (04) : 451 - 457
  • [35] Context Based Semantic Scene Classification and Recognition Used for a Vision-Based Mobile Robot
    Madokoro, Hirokazu
    Sato, Kazuhito
    Nakasho, Kazuhisa
    Shimoi, Nobuhiro
    2017 26TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2017, : 1332 - 1337
  • [36] Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
    Wu, Shun-Cheng
    Tateno, Keisuke
    Navab, Nassir
    Tombari, Federico
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5064 - 5074
  • [37] WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians
    Kotovenko, Dmytro
    Grebenkova, Olga
    Sarafianos, Nikolaos
    Paliwal, Avinash
    Ma, Pingchuan
    Poursaeed, Omid
    Mohan, Sreyas
    Fang, Yuchen
    Li, Yilei
    Ranjan, Rakesh
    Ommer, Bjoern
    COMPUTER VISION - ECCV 2024, PT XXI, 2025, 15079 : 298 - 314
  • [38] Deep 3D semantic scene extrapolation
    Ali Abbasi
    Sinan Kalkan
    Yusuf Sahillioğlu
    The Visual Computer, 2019, 35 : 271 - 279
  • [39] Deep 3D semantic scene extrapolation
    Abbasi, Ali
    Kalkan, Sinan
    Sahillioglu, Yusuf
    VISUAL COMPUTER, 2019, 35 (02): : 271 - 279
  • [40] 3D Semantic Scene Completion: A Survey
    Luis Roldão
    Raoul de Charette
    Anne Verroust-Blondet
    International Journal of Computer Vision, 2022, 130 : 1978 - 2005