CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection

被引：0

作者：

Cao, Xipeng ^{[1
,2
]}

Yuan, Peng ^{[2
]}

Feng, Bailan ^{[2
]}

Niu, Kun ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recently proposed DEtection TRansformer (DETR) achieves promising performance for end-to-end object detection. However, it has relatively lower detection performance on small objects and suffers from slow convergence. This paper observed that DETR performs surprisingly well even on small objects when measuring Average Precision (AP) at decreased Intersection-over-Union (IoU) thresholds. Motivated by this observation, we propose a simple way to improve DETR by refining the coarse features and predicted locations. Specifically, we propose a novel Coarse-to-Fine (CF) decoder layer constituted of a coarse layer and a carefully designed fine layer. Within each CF decoder layer, the extracted local information (region of interest feature) is introduced into the flow of global context information from the coarse layer to refine and enrich the object query features via the fine layer. In the fine layer, the multi-scale information can be fully explored and exploited via the Adaptive Scale Fusion(ASF) module and Local Cross-Attention (LCA) module. The multi-scale information can also be enhanced by another proposed Transformer Enhanced FPN (TER) module to further improve the performance. With our proposed framework (named CF-DETR), the localization accuracy of objects (especially for small objects) can be largely improved. As a byproduct, the slow convergence issue of DETR can also be addressed. The effectiveness of CF-DETR is validated via extensive experiments on the coco benchmark. CF-DETR achieves state-of-the-art performance among end-to-end detectors, e.g., achieving 47.8 AP using ResNet-50 with 36 epochs in the standard 3x training schedule.

引用

页码：185 / 193

页数：9

共 50 条

[21] End-to-End Detection for Key Equipment in Natural Gas Station with DETR
Liang, Xinyue
Su, Huai
Zhang, Jinjun
He, Yuxuan
Qin, Xiaodong
Yang, Zhaoming
ADVANCES IN CLEAN AND GREEN ENERGY SOLUTIONS: ICCGE 2024 PROCEEDINGS, 2025, 1333 : 43 - 54
[22] DITA: DETR with improved queries for end-to-end temporal action detection
Lu, Chongkai
Mak, Man-Wai
NEUROCOMPUTING, 2024, 596
[23] VPDETR: End-to-End Vanishing Point DEtection TRansformers
Chen, Taiyan
Ying, Xianghua
Yang, Jinfa
Wang, Ruibin
Guo, Ruohao
Xing, Bowei
Shi, Ji
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1192 - 1200
[24] Enhanced Sparse Detection for End-to-End Object Detection
Liao, Yongwei
Chen, Gang
Xu, Runnan
IEEE ACCESS, 2022, 10 : 85630 - 85640
[25] FSH-DETR: An Efficient End-to-End Fire Smoke and Human Detection Based on a Deformable DEtection TRansformer (DETR)
Liang, Tianyu
Zeng, Guigen
SENSORS, 2024, 24 (13)
[26] EOOD: End-to-end oriented object detection
Zhang, Caiguang
Chen, Zilong
Xiong, Boli
Ji, Kefeng
Kuang, Gangyao
NEUROCOMPUTING, 2025, 621
[27] Intrinsic Explainability for End-to-End Object Detection
Fernandes, Luis
Fernandes, Joao N. D.
Calado, Mariana
Pinto, Joao Ribeiro
Cerqueira, Ricardo
Cardoso, Jaime S.
IEEE ACCESS, 2024, 12 : 2623 - 2634
[28] What Makes for End-to-End Object Detection?
Sun, Peize
Jiang, Yi
Xie, Enze
Shao, Wenqi
Yuan, Zehuan
Wang, Changhu
Luo, Ping
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[29] FOF: a fine-grained object detection and feature extraction end-to-end network
Wenzhong Shen
Jinpeng Chen
Jie Shao
International Journal of Multimedia Information Retrieval, 2023, 12
[30] A Coarse-to-fine approach for fast deformable object detection
Pedersoli, Marco
Vedaldi, Andrea
Gonzalez, Jordi
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 1353 - 1360

← 1 2 3 4 5 →