OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

被引:0
|
作者
Zhang, Hu [1 ]
Ku, Jianhua [4 ]
Tang, Tao [5 ]
Sun, Haiyang [6 ]
Huang, Xin [2 ]
Huang, Zi [2 ]
Yu, Kaicheng [3 ]
机构
[1] CSIRO DATA61, Sydney, NSW, Australia
[2] Univ Queensland, Brisbane, Qld, Australia
[3] Westlake Univ, Hangzhou, Peoples R China
[4] Alibaba, DAMO Acad, Beijing, Peoples R China
[5] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Peoples R China
[6] LiAuto Inc, Beijing, Peoples R China
来源
关键词
OpenSight; Open-vocabulary; 3D object detection; VOXELNET;
D O I
10.1007/978-3-031-72907-2_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional LiDAR-based object detection research primarily focuses on closed-set scenarios, which falls short in complex real-world applications. Directly transferring existing 2D open-vocabulary models with some known LiDAR classes for open-vocabulary ability, however, tends to suffer from over-fitting problems: The obtained model will detect the known objects, even presented with a novel category. In this paper, we propose OpenSight, a more advanced 2D-3D modeling framework for LiDAR-based open-vocabulary detection. OpenSight utilizes 2D-3D geometric priors for the initial discernment and localization of generic objects, followed by a more specific semantic interpretation of the detected objects. The process begins by generating 2D boxes for generic objects from the accompanying camera images of LiDAR. These 2D boxes, together with LiDAR points, are then lifted back into the LiDAR space to estimate corresponding 3D boxes. For better generic object perception, our framework integrates both temporal and spatial-aware constraints. Temporal awareness correlates the predicted 3D boxes across consecutive timestamps, recalibrating the missed or inaccurate boxes. The spatial awareness randomly places some "precisely" estimated 3D boxes at varying distances, increasing the visibility of generic objects. To interpret the specific semantics of detected objects, we develop a cross-modal alignment and fusion module to first align 3D features with 2D image embeddings and then fuse the aligned 3D-2D features for semantic decoding. Our experiments indicate that our method establishes state-of-the-art open-vocabulary performance on widely used 3D detection benchmarks and effectively identifies objects for new categories of interest.
引用
收藏
页码:1 / 19
页数:19
相关论文
共 50 条
  • [41] Open-vocabulary object 6D pose estimation
    Corsetti, Jaime
    Boscaini, Davide
    Oh, Changjae
    Cavallaro, Andrea
    Poiesi, Fabio
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18071 - 18080
  • [42] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ma, Ruixin
    Yin, Yong
    Chen, Jing
    Chang, Rihao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 7995 - 8012
  • [43] A LiDAR-Based Obstacle-Detection Framework for Autonomous Driving
    Wang, Lihao
    Zhao, Chengfeng
    Wang, Jun
    2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 901 - 905
  • [44] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [45] Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
    Etchegaray, Djamahl
    Huang, Zi
    Harada, Tatsuya
    Luo, Yadan
    COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 133 - 151
  • [46] Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
    Zhang, Heng
    Zhao, Qiuyu
    Zheng, Linyu
    Zeng, Hao
    Ge, Zhiwei
    Li, Tianhao
    Xu, Sulong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16975 - 16984
  • [47] LiDAR-Based Symmetrical Guidance for 3D Object Detection
    Chu, Huazhen
    Ma, Huimin
    Liu, Haizhuang
    Wang, Rongquan
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 472 - 483
  • [48] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
    Lu, Yuheng
    Xu, Chenfeng
    Wei, Xiaobao
    Xie, Xiaodong
    Tomizuka, Masayoshi
    Keutzer, Kurt
    Zhang, Shanghang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1190 - 1199
  • [49] LiDAR-based 3D Object Detection for Autonomous Driving
    Li, Zirui
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 507 - 512
  • [50] How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
    Yao, Yiyang
    Liu, Peng
    Zhao, Tiancheng
    Zhang, Qianqian
    Liao, Jiajia
    Fang, Chunxin
    Lee, Kyusong
    Wang, Qing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6630 - 6638