CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

被引:4
|
作者
Long, Yanxin [1 ]
Wen, Youpeng [1 ]
Han, Jianhua [2 ]
Xu, Hang [2 ]
Ren, Pengzhen [1 ]
Zhang, Wei [2 ]
Zhao, Shen [1 ]
Liang, Xiaodan [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Peoples R China
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
[3] MBZUAI, Abu Dhabi, U Arab Emirates
关键词
D O I
10.1109/CVPR52729.2023.01462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Benefiting from large-scale vision-language pre-training on image-text pairs, open-world detection methods have shown superior generalization ability under the zero-shot or few-shot detection settings. However, a pre-defined category space is still required during the inference stage of existing methods and only the objects belonging to that space will be predicted. To introduce a "real" open-world detector, in this paper, we propose a novel method named CapDet to either predict under a given category list or directly generate the category of predicted bounding boxes. Specifically, we unify the open-world detection and dense caption tasks into a single yet effective framework by introducing an additional dense captioning head to generate the region-grounded captions. Besides, adding the captioning task will in turn benefit the generalization of detection performance since the captioning dataset covers more concepts. Experiment results show that by unifying the dense caption task, our CapDet has obtained significant performance improvements (e.g., +2.1% mAP on LVIS rare classes) over the baseline method on LVIS (1203 classes). Besides, our CapDet also achieves state-of-the-art performance on dense captioning tasks, e.g., 15.44% mAP on VG V1.2 and 13.98% on the VG-COCO dataset.
引用
收藏
页码:15233 / 15243
页数:11
相关论文
共 50 条
  • [1] Open-world continual learning: Unifying novelty detection and continual learning
    Kim, Gyuhak
    Xiao, Changnan
    Konishi, Tatsuya
    Ke, Zixuan
    Liu, Bing
    [J]. Artificial Intelligence, 2025, 338
  • [2] Multimodal Pretraining for Dense Video Captioning
    Huang, Gabriel
    Pang, Bo
    Zhu, Zhenhai
    Rivera, Clara
    Soricut, Radu
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 470 - 490
  • [3] Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
    Wang, Weiyao
    Feiszli, Matt
    Wang, Heng
    Tran, Du
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10756 - 10765
  • [4] DDOWOD: DiffusionDet for open-world object detection
    Fan, Jiaqi
    Zhang, Enming
    Wei, Ying
    Wang, Yuefeng
    Xia, Jiakun
    Liu, Junwei
    Liu, Xinghong
    Ma, Shuailei
    [J]. Pattern Recognition Letters, 2024, 186 : 170 - 177
  • [5] A CHANGEPOINT METHOD FOR OPEN-WORLD NOVELTY DETECTION
    McLure, Matthew D.
    Musliner, David J.
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 5329 - 5332
  • [6] Semi-supervised Open-World Object Detection
    Mullappilly, Sahal Shaji
    Gehlot, Abhishek Singh
    Anwer, Rao Muhammad
    Khan, Fahad Shahbaz
    Cholakkal, Hisham
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4305 - 4314
  • [7] Open-world structured sequence learning via dense target encoding
    Zhang, Qin
    Liu, Ziqi
    Li, Qincai
    Xiang, Haolong
    Yu, Zhizhi
    Chen, Junyang
    Zhang, Peng
    Chen, Xiaojun
    [J]. INFORMATION SCIENCES, 2024, 680
  • [8] OW-DETR: Open-world Detection Transformer
    Gupta, Akshita
    Narayan, Sanath
    Joseph, K. J.
    Khan, Salman
    Khan, Fahad Shahbaz
    Shah, Mubarak
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9225 - 9234
  • [9] Poster: Towards Robust Open-World Detection of Deepfakes
    Sohrawardi, Saniat Javid
    Chintha, Akash
    Thai, Bao
    Seng, Sovantharith
    Hickerson, Andrea
    Ptucha, Raymond
    Wright, Matthew
    [J]. PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'19), 2019, : 2613 - 2615
  • [10] Rethinking Open-World Object Detection in Autonomous Driving Scenarios
    Ma, Zeyu
    Yang, Yang
    Wang, Guoqing
    Xu, Xing
    Shen, Heng Tao
    Zhang, Mingxing
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1279 - 1288