Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

被引:0
|
作者
Zhang, Manyuan [1 ,2 ]
Song, Guanglu [2 ]
Liu, Yu [2 ]
Li, Hongsheng [1 ,3 ,4 ]
机构
[1] Chinese Univ HongKong, Multimedia Lab, Hong Kong, Peoples R China
[2] SenseTime Res, Hong Kong, Peoples R China
[3] Ctr Perceptual & Interact Intelligence Ltd, Hong Kong, Peoples R China
[4] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and crossattention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two Image DETR Localization Classification tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the taskaware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.
引用
收藏
页码:6578 / 6587
页数:10
相关论文
共 50 条
  • [41] End-to-End Object Detection with Enhanced Positive Sample Filter
    Song, Xiaolin
    Chen, Binghui
    Li, Pengyu
    Wang, Biao
    Zhang, Honggang
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [42] End-to-End Object-Level Contrastive Pretraining for Detection via Semantic-Aware Localization
    Geng, Long
    Huang, Xiaoming
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 293 - 304
  • [43] End-to-end power equipment detection and localization with RM transformer
    Fang, Jian
    Wang, Youyuan
    Chen, Weigen
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2022, 16 (19) : 3941 - 3950
  • [44] FR-DETR: End-to-End Flowchart Recognition With Precision and Robustness
    Sun, Lianshan
    Du, Hanchao
    Hou, Tao
    IEEE ACCESS, 2022, 10 : 64292 - 64301
  • [45] SText-DETR: End-to-End Arbitrary-Shaped Text Detection with Scalable Query in Transformer
    Liao, Pujin
    Wang, Zengfu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 481 - 492
  • [46] DHLA: Dynamic Hybrid Label Assignment for End-to-End Object Detection
    Hu, Zhiliang
    Chen, Si
    Hua, Yang
    Wang, Da-Han
    Zhu, Shunzhi
    Yan, Yan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1055 - 1069
  • [47] An End-to-End Cascaded Image Deraining and Object Detection Neural Network
    Wang, Kaige
    Wang, Tianming
    Qu, Jianchuang
    Jiang, Huatao
    Li, Qing
    Chang, Lin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 9541 - 9548
  • [48] SparseDet: Towards End-to-End 3D Object Detection
    Han, Jianhong
    Wan, Zhaoyi
    Liu, Zhe
    Feng, Jie
    Zhou, Bingfeng
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 781 - 792
  • [49] Towards Precise End-to-end Weakly Supervised Object Detection Network
    Yang, Ke
    Li, Dongsheng
    Dou, Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8371 - 8380
  • [50] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516