CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection

被引:0
|
作者
Cao, Xipeng [1 ,2 ]
Yuan, Peng [2 ]
Feng, Bailan [2 ]
Niu, Kun [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recently proposed DEtection TRansformer (DETR) achieves promising performance for end-to-end object detection. However, it has relatively lower detection performance on small objects and suffers from slow convergence. This paper observed that DETR performs surprisingly well even on small objects when measuring Average Precision (AP) at decreased Intersection-over-Union (IoU) thresholds. Motivated by this observation, we propose a simple way to improve DETR by refining the coarse features and predicted locations. Specifically, we propose a novel Coarse-to-Fine (CF) decoder layer constituted of a coarse layer and a carefully designed fine layer. Within each CF decoder layer, the extracted local information (region of interest feature) is introduced into the flow of global context information from the coarse layer to refine and enrich the object query features via the fine layer. In the fine layer, the multi-scale information can be fully explored and exploited via the Adaptive Scale Fusion(ASF) module and Local Cross-Attention (LCA) module. The multi-scale information can also be enhanced by another proposed Transformer Enhanced FPN (TER) module to further improve the performance. With our proposed framework (named CF-DETR), the localization accuracy of objects (especially for small objects) can be largely improved. As a byproduct, the slow convergence issue of DETR can also be addressed. The effectiveness of CF-DETR is validated via extensive experiments on the coco benchmark. CF-DETR achieves state-of-the-art performance among end-to-end detectors, e.g., achieving 47.8 AP using ResNet-50 with 36 epochs in the standard 3x training schedule.
引用
收藏
页码:185 / 193
页数:9
相关论文
共 50 条
  • [31] Recursive coarse-to-fine localization for fast object detection
    Na, I.S. (ypencil@hanmail.net), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (07):
  • [32] FOF: a fine-grained object detection and feature extraction end-to-end network
    Shen, Wenzhong
    Chen, Jinpeng
    Shao, Jie
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
  • [33] A coarse-to-fine approach for fast deformable object detection
    Pedersoli, Marco
    Vedaldi, Andrea
    Gonzalez, Jordi
    Roca, Xavier
    PATTERN RECOGNITION, 2015, 48 (05) : 1844 - 1853
  • [34] Casting-DETR: An End-to-End Network for Casting Surface Defect Detection
    Pu, Quan-cheng
    Hui, Zhang
    Xu, Xiang-rong
    Zhang, Long
    Gao, Ju
    Rodic, Aleksandar
    Petrovic, Petar B.
    Wang, Hai-yan
    Xu, Shan-shan
    Wang, Zhi-xiong
    INTERNATIONAL JOURNAL OF METALCASTING, 2024, 18 (04) : 3152 - 3165
  • [35] MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion
    Chu, Shih-Yun
    Lee, Ming-Sui
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5241 - 5250
  • [36] Recursive Coarse-to-Fine Localization for Fast Object Detection
    Pedersoli, Marco
    Gonzalez, Jordi
    Bagdanov, Andrew D.
    Villanueva, Juan J.
    COMPUTER VISION - ECCV 2010, PT VI, 2010, 6316 : 280 - +
  • [37] Salient object detection using coarse-to-fine processing
    Zhou, Qiangqiang
    Zhang, Lin
    Zhao, Weidong
    Liu, Xianhui
    Chen, Yufei
    Wang, Zhicheng
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2017, 34 (03) : 370 - 383
  • [38] End-to-End Human-Gaze-Target Detection with Transformers
    Tu, Danyang
    Min, Xiongkuo
    Duan, Huiyu
    Guo, Guodong
    Zhai, Guangtao
    Shen, Wei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2192 - 2200
  • [39] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824
  • [40] DIMD-DETR: DDQ-DETR With Improved Metric Space for End-to-End Object Detector on Remote Sensing Aircrafts
    Liu, Huan
    Ren, Xuefeng
    Gan, Yang
    Chen, Yongming
    Lin, Ping
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4498 - 4509