Dense Distinct Query for End-to-End Object Detection

被引:106
|
作者
Zhang, Shilong [1 ,3 ]
Wang, Xinjiang [2 ]
Wang, Jiaqi [1 ]
Pang, Jiangmiao [1 ]
Lyu, Chengqi [1 ]
Zhang, Wenwei [1 ,4 ]
Luo, Ping [1 ,3 ]
Chen, Kai [1 ]
机构
[1] Shanghai AI Lab, Shanghai, Peoples R China
[2] SenseTime Res, Hong Kong, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Nanyang Technol Univ, S Lab, Singapore, Singapore
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00708
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queries are problematic, then what are the expected queries in end-to-end object detection? This paper shows that the solution should be Dense Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments. DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors including FCN, R-CNN, and DETRs. Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting. DDQ also shares the benefit of end-to-end detectors in crowded scenes and achieves 93.8 AP on Crowd-Human. We hope DDQ can inspire researchers to consider the complementarity between traditional methods and end-to-end detectors. The source code can be found at https://github.com/jshilong/DDQ.
引用
收藏
页码:7329 / 7338
页数:10
相关论文
共 50 条
  • [31] HOTR: End-to-End Human-Object Interaction Detection with Transformers
    Kim, Bumsoo
    Lee, Junhyun
    Kang, Jaewoo
    Kim, Eun-Sol
    Kim, Hyunwoo J.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 74 - 83
  • [32] End-to-End Semi-Supervised Object Detection with Soft Teacher
    Xu, Mengde
    Zhang, Zheng
    Hu, Han
    Wang, Jianfeng
    Wang, Lijuan
    Wei, Fangyun
    Bai, Xiang
    Liu, Zicheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3040 - 3049
  • [33] RecursiveDet: End-to-End Region-based Recursive Object Detection
    Zhao, Jing
    Sun, Li
    Li, Qingli
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6284 - 6293
  • [34] Joint Detection and Association for End-to-End Multi-object Tracking
    Ye Li
    Xiaoyu Luo
    Junyu Shi
    Xinzhong Wang
    Guangqiang Yin
    Zhiguo Wang
    Neural Processing Letters, 2023, 55 : 11823 - 11844
  • [35] End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
    Liao, Mingxiang
    Wan, Fang
    Yao, Yuan
    Han, Zhenjun
    Zou, Jialing
    Wang, Yuze
    Feng, Bailan
    Yuan, Peng
    Ye, Qixiang
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 210 - 226
  • [36] Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
    Lai, Baisheng
    Gong, Xiaojin
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2053 - 2059
  • [37] Joint Detection and Association for End-to-End Multi-object Tracking
    Li, Ye
    Luo, Xiaoyu
    Shi, Junyu
    Wang, Xinzhong
    Yin, Guangqiang
    Wang, Zhiguo
    NEURAL PROCESSING LETTERS, 2023, 55 (09) : 11823 - 11844
  • [38] Transformer-based End-to-End Object Detection in Aerial Images
    Vo, Nguyen D.
    Le, Nguyen
    Ngo, Giang
    Doan, Du
    Le, Do
    Nguyen, Khang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
  • [39] V-DETR: Pure Transformer for End-to-End Object Detection
    Dung Nguyen
    Van-Dung Hoang
    Van-Tuong-Lan Le
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 120 - 131
  • [40] Query-guided End-to-End Person Search
    Munjal, Bharti
    Amin, Sikandar
    Tombari, Federico
    Galasso, Fabio
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 811 - 820