Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

被引:0
|
作者
Zhang, Manyuan [1 ,2 ]
Song, Guanglu [2 ]
Liu, Yu [2 ]
Li, Hongsheng [1 ,3 ,4 ]
机构
[1] Chinese Univ HongKong, Multimedia Lab, Hong Kong, Peoples R China
[2] SenseTime Res, Hong Kong, Peoples R China
[3] Ctr Perceptual & Interact Intelligence Ltd, Hong Kong, Peoples R China
[4] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and crossattention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two Image DETR Localization Classification tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the taskaware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.
引用
收藏
页码:6578 / 6587
页数:10
相关论文
共 50 条
  • [1] Dynamic DETR: End-to-End Object Detection with Dynamic Attention
    Dai, Xiyang
    Chen, Yinpeng
    Yang, Jianwei
    Zhang, Pengchuan
    Yuan, Lu
    Zhang, Lei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2968 - 2977
  • [2] V-DETR: Pure Transformer for End-to-End Object Detection
    Dung Nguyen
    Van-Dung Hoang
    Van-Tuong-Lan Le
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 120 - 131
  • [3] CPH DETR: Comprehensive Regression Loss for End-to-End Object Detection
    Wu, Jihao
    Li, Shufang
    Kang, Guxia
    Yang, Yuqing
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 93 - 107
  • [4] DITA: DETR with improved queries for end-to-end temporal action detection
    Lu, Chongkai
    Mak, Man-Wai
    NEUROCOMPUTING, 2024, 596
  • [5] Pruning DETR: efficient end-to-end object detection with sparse structured pruning
    Huaiyuan Sun
    Shuili Zhang
    Xve Tian
    Yuanyuan Zou
    Signal, Image and Video Processing, 2024, 18 : 129 - 135
  • [6] Pruning DETR: efficient end-to-end object detection with sparse structured pruning
    Sun, Huaiyuan
    Zhang, Shuili
    Tian, Xve
    Zou, Yuanyuan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 129 - 135
  • [7] CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection
    Cao, Xipeng
    Yuan, Peng
    Feng, Bailan
    Niu, Kun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 185 - 193
  • [8] End-to-end pest detection on an improved deformable DETR with multihead criss cross attention
    Qi, Fang
    Chen, Gangming
    Liu, Jieyuan
    Tang, Zhe
    ECOLOGICAL INFORMATICS, 2022, 72
  • [9] DIMD-DETR: DDQ-DETR With Improved Metric Space for End-to-End Object Detector on Remote Sensing Aircrafts
    Liu, Huan
    Ren, Xuefeng
    Gan, Yang
    Chen, Yongming
    Lin, Ping
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4498 - 4509
  • [10] L-DETR: A Light-Weight Detector for End-to-End Object Detection With Transformers
    Li, Tianyang
    Wang, Jian
    Zhang, Tibing
    IEEE ACCESS, 2022, 10 : 105685 - 105692