CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection

被引：0

作者：

Cao, Xipeng ^{[1
,2
]}

Yuan, Peng ^{[2
]}

Feng, Bailan ^{[2
]}

Niu, Kun ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recently proposed DEtection TRansformer (DETR) achieves promising performance for end-to-end object detection. However, it has relatively lower detection performance on small objects and suffers from slow convergence. This paper observed that DETR performs surprisingly well even on small objects when measuring Average Precision (AP) at decreased Intersection-over-Union (IoU) thresholds. Motivated by this observation, we propose a simple way to improve DETR by refining the coarse features and predicted locations. Specifically, we propose a novel Coarse-to-Fine (CF) decoder layer constituted of a coarse layer and a carefully designed fine layer. Within each CF decoder layer, the extracted local information (region of interest feature) is introduced into the flow of global context information from the coarse layer to refine and enrich the object query features via the fine layer. In the fine layer, the multi-scale information can be fully explored and exploited via the Adaptive Scale Fusion(ASF) module and Local Cross-Attention (LCA) module. The multi-scale information can also be enhanced by another proposed Transformer Enhanced FPN (TER) module to further improve the performance. With our proposed framework (named CF-DETR), the localization accuracy of objects (especially for small objects) can be largely improved. As a byproduct, the slow convergence issue of DETR can also be addressed. The effectiveness of CF-DETR is validated via extensive experiments on the coco benchmark. CF-DETR achieves state-of-the-art performance among end-to-end detectors, e.g., achieving 47.8 AP using ResNet-50 with 36 epochs in the standard 3x training schedule.

引用

页码：185 / 193

页数：9

共 50 条

[1] L-DETR: A Light-Weight Detector for End-to-End Object Detection With Transformers
Li, Tianyang
Wang, Jian
Zhang, Tibing
IEEE ACCESS, 2022, 10 : 105685 - 105692
[2] Dynamic DETR: End-to-End Object Detection with Dynamic Attention
Dai, Xiyang
Chen, Yinpeng
Yang, Jianwei
Zhang, Pengchuan
Yuan, Lu
Zhang, Lei
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2968 - 2977
[3] Non-autoregressive End-to-End TTS with Coarse-to-Fine Decoding
Wang, Tao
Liu, Xuefei
Tao, Jianhua
Yi, Jiangyan
Fu, Ruibo
Wen, Zhengqi
INTERSPEECH 2020, 2020, : 3984 - 3988
[4] V-DETR: Pure Transformer for End-to-End Object Detection
Dung Nguyen
Van-Dung Hoang
Van-Tuong-Lan Le
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 120 - 131
[5] CPH DETR: Comprehensive Regression Loss for End-to-End Object Detection
Wu, Jihao
Li, Shufang
Kang, Guxia
Yang, Yuqing
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 93 - 107
[6] Motion-Driven Tracking via End-to-End Coarse-to-Fine Verifying
Wang, Rui
Zhong, Bineng
Chen, Yan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1007 - 1019
[7] Pruning DETR: efficient end-to-end object detection with sparse structured pruning
Huaiyuan Sun
Shuili Zhang
Xve Tian
Yuanyuan Zou
Signal, Image and Video Processing, 2024, 18 : 129 - 135
[8] Deeply Tensor Compressed Transformers for End-to-End Object Detection
Zhen, Peining
Gao, Ziyang
Hou, Tianshu
Cheng, Yuan
Chen, Hai-Bao
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4716 - 4724
[9] Pruning DETR: efficient end-to-end object detection with sparse structured pruning
Sun, Huaiyuan
Zhang, Shuili
Tian, Xve
Zou, Yuanyuan
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 129 - 135
[10] Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
Zhang, Manyuan
Song, Guanglu
Liu, Yu
Li, Hongsheng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6578 - 6587

← 1 2 3 4 5 →