SOGDet: Semantic-Occupancy Guided Multi-View 3D Object Detection

被引：0

作者：

Zhou, Qiu

Cao, Jinming ^{[1
]}

Leng, Hanchao ^{[2
]}

Yin, Yifang ^{[3
]}

Kun, Yu ^{[2
]}

Zimmermann, Roger ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Xiaomi Car, Singapore, Singapore

[3] ASTAR, I2R, Singapore, Singapore

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the field of autonomous driving, accurate and comprehensive perception of the 3D environment is crucial. Bird's Eye View (BEV) based methods have emerged as a promising solution for 3D object detection using multi-view images as input. However, existing 3D object detection methods often ignore the physical context in the environment, such as sidewalk and vegetation, resulting in sub-optimal performance. In this paper, we propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection), that leverages a 3D semantic-occupancy branch to improve the accuracy of 3D object detection. In particular, the physical context modeled by semantic occupancy helps the detector to perceive the scenes in a more holistic view. Our SOGDet is flexible to use and can be seamlessly integrated with most existing BEV-based methods. To evaluate its effectiveness, we apply this approach to several state-of-the-art baselines and conduct extensive experiments on the exclusive nuScenes dataset. Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP). This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems. The codes are available at: https://github.com/zhouqiu/SOGDet.

引用

页码：7668 / 7676

页数：9

共 50 条

[41] Multi-view dual attention network for 3D object recognition
Wenju Wang
Yu Cai
Tao Wang
[J]. Neural Computing and Applications, 2022, 34 : 3201 - 3212
[42] Deep models for multi-view 3D object recognition: a review
Alzahrani, Mona
Usman, Muhammad
Jarraya, Salma Kammoun
Anwar, Saeed
Helmy, Tarek
[J]. Artificial Intelligence Review, 2024, 57 (12)
[43] Multi-view and multivariate gaussian descriptor for 3D object retrieval
Gao, Zan
Xue, Kai-Xin
Zhang, Hua
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 555 - 572
[44] Multi-view and multivariate gaussian descriptor for 3D object retrieval
Zan Gao
Kai-Xin Xue
Hua Zhang
[J]. Multimedia Tools and Applications, 2019, 78 : 555 - 572
[45] 3D Object Localisation from Multi-View Image Detections
Rubino, Cosimo
Crocco, Marco
Del Bue, Alessio
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1281 - 1294
[46] Multi-view dual attention network for 3D object recognition
Wang, Wenju
Cai, Yu
Wang, Tao
[J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (04): : 3201 - 3212
[47] Multi-view Harmonized Bilinear Network for 3D Object Recognition
Yu, Tan
Meng, Jingjing
Yuan, Junsong
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 186 - 194
[48] Prior-Guided Multi-View 3D Head Reconstruction
Wang, Xueying
Guo, Yudong
Yang, Zhongqi
Zhang, Juyong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 4028 - 4040
[49] SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
Zhang, Jinqing
Zhang, Yanan
Liu, Qingjie
Wang, Yunhong
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3325 - 3334
[50] Multi-view Manhole Detection, Recognition, and 3D Localisation
Timofte, Radu
Van Gool, Luc
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,

← 1 2 3 4 5 →