GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

被引:0
|
作者
Huang, Yuanhui [1 ]
Zheng, Wenzhao [1 ,2 ]
Zhang, Yunpeng [3 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] PhiGent Robot, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
3D Occupancy Prediction; 3D Gaussian splatting; Autonomous Driving; PRIORS;
D O I
10.1007/978-3-031-73383-3_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8%-24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.
引用
收藏
页码:376 / 393
页数:18
相关论文
共 50 条
  • [21] Vision-based posing of 3D virtual actors
    Vaidya, AS
    Shaji, A
    Chandran, S
    COMPUTER VISION - ACCV 2006, PT II, 2006, 3852 : 91 - 100
  • [22] Adaptive vision-based crack detection using 3D scene reconstruction for condition assessment of structures
    Jahanshahi, Mohammad R.
    Masri, Sami F.
    AUTOMATION IN CONSTRUCTION, 2012, 22 : 567 - 576
  • [23] 3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs
    Looper, Samuel
    Rodriguez-Puigvert, Javier
    Siegwart, Roland
    Cadena, Cesar
    Schmid, Lukas
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8179 - 8186
  • [24] Prediction of the scene quality for stereo vision-based autonomous navigation
    Roggeman, Helene
    Marzat, Julien
    Bernard-Brunel, Anthelme
    Le Besnerais, Guy
    IFAC PAPERSONLINE, 2016, 49 (15): : 94 - 99
  • [25] Microassembly of Complex and Solid 3D MEMS by 3D Vision-based Control
    Tamadazte, Brahim
    Le Fort-Piat, Nadine
    Dembele, Sounkalo
    Marchand, Eric
    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 3284 - 3289
  • [26] Stereo Vision-Based Semantic 3D Object and Ego-Motion Tracking for Autonomous Driving
    Li, Peiliang
    Qin, Tong
    Shen, Shaojie
    COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 664 - 679
  • [27] Semantic-based Rules for 3D Scene Adaptation
    Bilasco, Ioan Marius
    Villanova-Oliver, Marlene
    Gensel, Jerome
    Martin, Herve
    WEB3D 2007 - 12TH INTERNATIONAL CONFERENCE ON 3D WEB TECHNOLOGY, PROCEEDINGS, 2007, : 97 - 100
  • [28] Comparing Vision-based to Sonar-based 3D Reconstruction
    Frank, Netanel
    Wolf, Lior
    Olshansky, Danny
    Boonman, Arjan
    Yovel, Yossi
    2020 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL PHOTOGRAPHY (ICCP), 2020,
  • [29] A WiFi Vision-based 3D Human Mesh Reconstruction
    Wang, Yichao
    Ren, Yili
    Chen, Yingying
    Yang, Jie
    PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022, 2022, : 814 - 816
  • [30] Vision-Based System for 3D Tower Crane Monitoring
    Gutierrez, Ricardo
    Magallon, Monica
    Hernandez Jr, Danilo Caceres
    IEEE SENSORS JOURNAL, 2021, 21 (10) : 11935 - 11945