GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

被引：0

作者：

Huang, Yuanhui ^{[1
]}

Zheng, Wenzhao ^{[1
,2
]}

Zhang, Yunpeng ^{[3
]}

Zhou, Jie ^{[1
]}

Lu, Jiwen ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

[3] PhiGent Robot, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT XXVII | 2025年 / 15085卷

基金：

中国国家自然科学基金;

关键词：

3D Occupancy Prediction; 3D Gaussian splatting; Autonomous Driving; PRIORS;

D O I：

10.1007/978-3-031-73383-3_22

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8%-24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.

引用

页码：376 / 393

页数：18

共 50 条

[21] Vision-based posing of 3D virtual actors
Vaidya, AS
Shaji, A
Chandran, S
COMPUTER VISION - ACCV 2006, PT II, 2006, 3852 : 91 - 100
[22] Adaptive vision-based crack detection using 3D scene reconstruction for condition assessment of structures
Jahanshahi, Mohammad R.
Masri, Sami F.
AUTOMATION IN CONSTRUCTION, 2012, 22 : 567 - 576
[23] 3D VSG: Long-term Semantic Scene Change Prediction through 3D Variable Scene Graphs
Looper, Samuel
Rodriguez-Puigvert, Javier
Siegwart, Roland
Cadena, Cesar
Schmid, Lukas
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8179 - 8186
[24] Prediction of the scene quality for stereo vision-based autonomous navigation
Roggeman, Helene
Marzat, Julien
Bernard-Brunel, Anthelme
Le Besnerais, Guy
IFAC PAPERSONLINE, 2016, 49 (15): : 94 - 99
[25] Microassembly of Complex and Solid 3D MEMS by 3D Vision-based Control
Tamadazte, Brahim
Le Fort-Piat, Nadine
Dembele, Sounkalo
Marchand, Eric
2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 3284 - 3289
[26] Stereo Vision-Based Semantic 3D Object and Ego-Motion Tracking for Autonomous Driving
Li, Peiliang
Qin, Tong
Shen, Shaojie
COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 664 - 679
[27] Semantic-based Rules for 3D Scene Adaptation
Bilasco, Ioan Marius
Villanova-Oliver, Marlene
Gensel, Jerome
Martin, Herve
WEB3D 2007 - 12TH INTERNATIONAL CONFERENCE ON 3D WEB TECHNOLOGY, PROCEEDINGS, 2007, : 97 - 100
[28] Comparing Vision-based to Sonar-based 3D Reconstruction
Frank, Netanel
Wolf, Lior
Olshansky, Danny
Boonman, Arjan
Yovel, Yossi
2020 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL PHOTOGRAPHY (ICCP), 2020,
[29] A WiFi Vision-based 3D Human Mesh Reconstruction
Wang, Yichao
Ren, Yili
Chen, Yingying
Yang, Jie
PROCEEDINGS OF THE 2022 THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, ACM MOBICOM 2022, 2022, : 814 - 816
[30] Vision-Based System for 3D Tower Crane Monitoring
Gutierrez, Ricardo
Magallon, Monica
Hernandez Jr, Danilo Caceres
IEEE SENSORS JOURNAL, 2021, 21 (10) : 11935 - 11945

← 1 2 3 4 5 →