CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection

被引：0

作者：

Cao, Xipeng ^{[1
,2
]}

Yuan, Peng ^{[2
]}

Feng, Bailan ^{[2
]}

Niu, Kun ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recently proposed DEtection TRansformer (DETR) achieves promising performance for end-to-end object detection. However, it has relatively lower detection performance on small objects and suffers from slow convergence. This paper observed that DETR performs surprisingly well even on small objects when measuring Average Precision (AP) at decreased Intersection-over-Union (IoU) thresholds. Motivated by this observation, we propose a simple way to improve DETR by refining the coarse features and predicted locations. Specifically, we propose a novel Coarse-to-Fine (CF) decoder layer constituted of a coarse layer and a carefully designed fine layer. Within each CF decoder layer, the extracted local information (region of interest feature) is introduced into the flow of global context information from the coarse layer to refine and enrich the object query features via the fine layer. In the fine layer, the multi-scale information can be fully explored and exploited via the Adaptive Scale Fusion(ASF) module and Local Cross-Attention (LCA) module. The multi-scale information can also be enhanced by another proposed Transformer Enhanced FPN (TER) module to further improve the performance. With our proposed framework (named CF-DETR), the localization accuracy of objects (especially for small objects) can be largely improved. As a byproduct, the slow convergence issue of DETR can also be addressed. The effectiveness of CF-DETR is validated via extensive experiments on the coco benchmark. CF-DETR achieves state-of-the-art performance among end-to-end detectors, e.g., achieving 47.8 AP using ResNet-50 with 36 epochs in the standard 3x training schedule.

引用

页码：185 / 193

页数：9

共 50 条

[31] Recursive coarse-to-fine localization for fast object detection
Na, I.S. (ypencil@hanmail.net), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (07):
[32] FOF: a fine-grained object detection and feature extraction end-to-end network
Shen, Wenzhong
Chen, Jinpeng
Shao, Jie
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (02)
[33] A coarse-to-fine approach for fast deformable object detection
Pedersoli, Marco
Vedaldi, Andrea
Gonzalez, Jordi
Roca, Xavier
PATTERN RECOGNITION, 2015, 48 (05) : 1844 - 1853
[34] Casting-DETR: An End-to-End Network for Casting Surface Defect Detection
Pu, Quan-cheng
Hui, Zhang
Xu, Xiang-rong
Zhang, Long
Gao, Ju
Rodic, Aleksandar
Petrovic, Petar B.
Wang, Hai-yan
Xu, Shan-shan
Wang, Zhi-xiong
INTERNATIONAL JOURNAL OF METALCASTING, 2024, 18 (04) : 3152 - 3165
[35] MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion
Chu, Shih-Yun
Lee, Ming-Sui
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5241 - 5250
[36] Recursive Coarse-to-Fine Localization for Fast Object Detection
Pedersoli, Marco
Gonzalez, Jordi
Bagdanov, Andrew D.
Villanueva, Juan J.
COMPUTER VISION - ECCV 2010, PT VI, 2010, 6316 : 280 - +
[37] Salient object detection using coarse-to-fine processing
Zhou, Qiangqiang
Zhang, Lin
Zhao, Weidong
Liu, Xianhui
Chen, Yufei
Wang, Zhicheng
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2017, 34 (03) : 370 - 383
[38] End-to-End Human-Gaze-Target Detection with Transformers
Tu, Danyang
Min, Xiongkuo
Duan, Huiyu
Guo, Guodong
Zhai, Guangtao
Shen, Wei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2192 - 2200
[39] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
Zheng, Sipeng
Chen, Shizhe
Jin, Qin
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824
[40] DIMD-DETR: DDQ-DETR With Improved Metric Space for End-to-End Object Detector on Remote Sensing Aircrafts
Liu, Huan
Ren, Xuefeng
Gan, Yang
Chen, Yongming
Lin, Ping
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4498 - 4509

← 1 2 3 4 5 →