GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

被引:0
|
作者
Huang, Yuanhui [1 ]
Zheng, Wenzhao [1 ,2 ]
Zhang, Yunpeng [3 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] PhiGent Robot, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
3D Occupancy Prediction; 3D Gaussian splatting; Autonomous Driving; PRIORS;
D O I
10.1007/978-3-031-73383-3_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8%-24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.
引用
收藏
页码:376 / 393
页数:18
相关论文
共 50 条
  • [1] Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
    Huang, Yuanhui
    Zheng, Wenzhao
    Zhang, Yunpeng
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9223 - 9232
  • [2] COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
    Ma, Qihang
    Tan, Xin
    Qu, Yanyun
    Ma, Lizhuang
    Zhang, Zhizhong
    Xie, Yuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 19936 - 19945
  • [3] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
    Zhang, Yunpeng
    Zhu, Zheng
    Du, Dalong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9399 - 9409
  • [4] SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
    Huang, Yuanhui
    Zheng, Wenzhao
    Zhang, Borui
    Zhou, Jie
    Lu, Jiwen
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 19946 - 19956
  • [5] Vision-based 3D scene analysis for driver assistance
    Burschka, D
    Hager, GD
    2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 812 - 818
  • [6] Neural vision-based semantic 3D world modeling
    Papadopoulos, Sotirios
    Mademlis, Ioannis
    Pitas, Ioannis
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2021), 2021, : 181 - 190
  • [7] Vision-based scene representation for 3D interaction of service robots
    Kitahama, Ken-ichi
    Tsukada, Akihiro
    Galpin, Franck
    Matsubara, Toshiyuki
    Hirano, Yutaka
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 4756 - +
  • [8] nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
    Zhu, Benjin
    Wang, Zhe
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 125 - 141
  • [9] LinkOcc: 3D Semantic Occupancy Prediction With Temporal Association
    Ouyang, Wenzhe
    Xu, Zenglin
    Shen, Bin
    Wang, Jinghua
    Xu, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1374 - 1384
  • [10] Stereo Vision-Based Gamma-Ray Imaging for 3D Scene Data Fusion
    Rathnayaka, Pathum
    Baek, Seung-Hae
    Park, Soon-Yong
    IEEE ACCESS, 2019, 7 : 89604 - 89613