Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection

被引:0
|
作者
Zhang, Manyuan [1 ,2 ]
Song, Guanglu [2 ]
Liu, Yu [2 ]
Li, Hongsheng [1 ,3 ,4 ]
机构
[1] Chinese Univ HongKong, Multimedia Lab, Hong Kong, Peoples R China
[2] SenseTime Res, Hong Kong, Peoples R China
[3] Ctr Perceptual & Interact Intelligence Ltd, Hong Kong, Peoples R China
[4] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The introduction of DETR represents a new paradigm for object detection. However, its decoder conducts classification and box localization using shared queries and crossattention layers, leading to suboptimal results. We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object. Salient regions provide vital information for classification, while the boundaries around them are more favorable for box regression. Unfortunately, such spatial misalignment between these two Image DETR Localization Classification tasks greatly hinders DETR's training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. To achieve this, we introduce a new design scheme called spatially decoupled DETR (SD-DETR), which includes a task-aware query generation module and a disentangled feature learning process. We elaborately design the task-aware query initialization process and divide the cross-attention block in the decoder to allow the taskaware queries to match different visual regions. Meanwhile, we also observe that the prediction misalignment problem for high classification confidence and precise localization exists, so we propose an alignment loss to further guide the spatially decoupled DETR training. Through extensive experiments, we demonstrate that our approach achieves a significant improvement in MSCOCO datasets compared to previous work. For instance, we improve the performance of Conditional DETR by 4.5 AP. By spatially disentangling the two tasks, our method overcomes the misalignment problem and greatly improves the performance of DETR for object detection.
引用
收藏
页码:6578 / 6587
页数:10
相关论文
共 50 条
  • [21] Intrinsic Explainability for End-to-End Object Detection
    Fernandes, Luis
    Fernandes, Joao N. D.
    Calado, Mariana
    Pinto, Joao Ribeiro
    Cerqueira, Ricardo
    Cardoso, Jaime S.
    IEEE ACCESS, 2024, 12 : 2623 - 2634
  • [22] What Makes for End-to-End Object Detection?
    Sun, Peize
    Jiang, Yi
    Xie, Enze
    Shao, Wenqi
    Yuan, Zehuan
    Wang, Changhu
    Luo, Ping
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [23] Casting-DETR: An End-to-End Network for Casting Surface Defect Detection
    Pu, Quan-cheng
    Hui, Zhang
    Xu, Xiang-rong
    Zhang, Long
    Gao, Ju
    Rodic, Aleksandar
    Petrovic, Petar B.
    Wang, Hai-yan
    Xu, Shan-shan
    Wang, Zhi-xiong
    INTERNATIONAL JOURNAL OF METALCASTING, 2024, 18 (04) : 3152 - 3165
  • [24] MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion
    Chu, Shih-Yun
    Lee, Ming-Sui
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5241 - 5250
  • [25] An end-to-end workflow for improved methylation detection
    Bonar, Lydia
    Butcher, Kristin
    Bocek, Michael
    Corbitt, Holly
    Hoglund, Bryan
    Nassif, Cibelle
    Cherry, Patrick
    Murphy, Derek
    Challacombe, Jean
    Toro, Esteban
    CANCER RESEARCH, 2023, 83 (07)
  • [26] End-to-End Object Detection with Fully Convolutional Network
    Wang, Jianfeng
    Song, Lin
    Li, Zeming
    Sun, Hongbin
    Sun, Jian
    Zheng, Nanning
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15844 - 15853
  • [27] SRDD: a lightweight end-to-end object detection with transformer
    Zhu, Yuan
    Xia, Qingyuan
    Jin, Wen
    CONNECTION SCIENCE, 2022, 34 (01) : 2448 - 2465
  • [28] Progressive End-to-End Object Detection in Crowded Scenes
    Zheng, Anlin
    Zhang, Yuang
    Zhang, Xiangyu
    Qi, Xiaojuan
    Sun, Jian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 847 - 856
  • [29] Toward End-to-End Object Detection and Tracking on the Edge
    Tabkhi, Hamed
    SEC 2017: 2017 THE SECOND ACM/IEEE SYMPOSIUM ON EDGE COMPUTING (SEC'17), 2017,
  • [30] Dense Distinct Query for End-to-End Object Detection
    Zhang, Shilong
    Wang, Xinjiang
    Wang, Jiaqi
    Pang, Jiangmiao
    Lyu, Chengqi
    Zhang, Wenwei
    Luo, Ping
    Chen, Kai
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7329 - 7338