OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation

被引:0
|
作者
Wang, Zhenyu [1 ]
Li, Yali [1 ]
Liu, Taichi [2 ]
Zhao, Hengshuang [3 ]
Wang, Shengjin [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China
[2] Rutgers State Univ, New Brunswick, NJ 08901 USA
[3] Univ Hong Kong, Pok Fu Lam, Hong Kong, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
D O I
10.1007/978-3-031-72970-6_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current state of 3D object detection research, the severe scarcity of annotated 3D data, substantial disparities across different data modalities, and the absence of a unified architecture, have impeded the progress towards the goal of universality. In this paper, we propose OV-Uni3DETR, a unified open-vocabulary 3D detector via cycle-modality propagation. Compared with existing 3D detectors, OV-Uni3DETR offers distinct advantages: 1) Open-vocabulary 3D detection: During training, it leverages various accessible data, especially extensive 2D detection images, to boost training diversity. During inference, it can detect both seen and unseen classes. 2) Modality unifying: It seamlessly accommodates input data from any given modality, effectively addressing scenarios involving disparate modalities or missing sensor information, thereby supporting test-time modality switching. 3) Scene unifying: It provides a unified multi-modal model architecture for diverse scenes collected by distinct sensors. Specifically, we propose the cycle-modality propagation, aimed at propagating knowledge bridging 2D and 3D modalities, to support the aforementioned functionalities. 2D semantic knowledge from large-vocabulary learning guides novel class discovery in the 3D domain, and 3D geometric knowledge provides localization supervision for 2D detection images. OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average. Its performance using only RGB images is on par with or even surpasses that of previous point cloud based methods. Code is available at https://github.com/zhenyuw16/Uni3DETR.
引用
收藏
页码:73 / 89
页数:17
相关论文
共 19 条
  • [1] Uni3DETR: Unified 3D Detection Transformer
    Wang, Zhenyu
    Li, Yali
    Chen, Xi
    Zhao, Hengshuang
    Wang, Shengjin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
    Etchegaray, Djamahl
    Huang, Zi
    Harada, Tatsuya
    Luo, Yadan
    COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 133 - 151
  • [3] Open-Vocabulary Point-Cloud Object Detection without 3D Annotation
    Lu, Yuheng
    Xu, Chenfeng
    Wei, Xiaobao
    Xie, Xiaodong
    Tomizuka, Masayoshi
    Keutzer, Kurt
    Zhang, Shanghang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1190 - 1199
  • [4] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [5] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
    Yao, Lewei
    Pi, Renjie
    Hang, Jianhua
    Liang, Xiaodan
    Xu, Hang
    Zhang, Wei
    Li, Zhenguo
    Xu, Dan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27381 - 27391
  • [6] Cross-Modal Collaboration and Robust Feature Classifier for Open-Vocabulary 3D Object Detection
    Liu, Hengsong
    Duan, Tongle
    SENSORS, 2025, 25 (02)
  • [7] FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection
    Zhang, Dongmei
    Li, Chang
    Zhang, Renrui
    Xie, Shenghao
    Xue, Wei
    Xie, Xiaodong
    Zhang, Shanghang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16723 - 16731
  • [8] Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
    Zhang, Bo
    Yuan, Jiakang
    Shi, Botian
    Chen, Tao
    Li, Yikang
    Qiao, Yu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9253 - 9262
  • [9] CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
    Cao, Yang
    Zeng, Yihan
    Xu, Hang
    Xu, Dan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] DeepInteraction: 3D Object Detection via Modality Interaction
    Yang, Zeyu
    Chen, Jiaqi
    Miao, Zhenwei
    Li, Wei
    Zhu, Xiatian
    Zhang, Li
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,