Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

被引:0
|
作者
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
机构
[1] The University of Sydney,Faculty of Engineering, School of Computer Science
来源
关键词
Object detection; Feature pyramid; Context modeling; 35A01; 65L10; 65L12; 65L20; 65L70;
D O I
暂无
中图分类号
学科分类号
摘要
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance. However, they usually require heavy cross-level connections or iterative refinement to obtain better MFF result, making them complicated in structure and inefficient in computation. To address these issues, we propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results while reducing the computational costs effectively. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency. The two representations include a locally concentrated representation and a globally summarized representation, where the former focuses on extracting context cues from nearby areas while the latter extracts general contextual representations of the whole image scene as global context cues. By collecting the condensed contexts, we employ a Transformer decoder to investigate the relations between them and each local feature from the FP and then refine the MFF results accordingly. As a result, we obtain a simple and light-weight Transformer-based Context Condensation (TCC) module, which can boost various FPs and lower their computational costs simultaneously. Extensive experimental results on the challenging MS COCO dataset show that TCC is compatible to four representative FPs and consistently improves their detection accuracy by up to 7.8% in terms of average precision and reduce their complexities by up to around 20% in terms of GFLOPs, helping them achieve state-of-the-art performance more efficiently. Code will be released at https://github.com/zhechen/TCC.
引用
收藏
页码:2738 / 2756
页数:18
相关论文
共 50 条
  • [1] Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection
    Chen, Zhe
    Zhang, Jing
    Xu, Yufei
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (10) : 2738 - 2756
  • [2] Boosting Salient Object Detection With Transformer-Based Asymmetric Bilateral U-Net
    Qiu, Yu
    Liu, Yun
    Zhang, Le
    Lu, Haotian
    Xu, Jing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2332 - 2345
  • [3] A Transformer-Based Framework for Tiny Object Detection
    Liao, Yi-Kai
    Lin, Gong-Si
    Yeh, Mei-Chen
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 373 - 377
  • [4] Survey of Transformer-Based Object Detection Algorithms
    Li, Jian
    Du, Jianqiang
    Zhu, Yanchen
    Guo, Yongkun
    Computer Engineering and Applications, 2023, 59 (10) : 48 - 64
  • [5] Feature Aggregated Queries for Transformer-based Video Object Detectors
    Cui, Yiming
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6365 - 6376
  • [6] Transformer-Based Visual Object Tracking with Global Feature Enhancement
    Wang, Shuai
    Fang, Genwen
    Liu, Lei
    Wang, Jun
    Zhu, Kongfen
    Melo, Silas N.
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [7] Fast Feature Pyramids for Object Detection
    Dollar, Piotr
    Appel, Ron
    Belongie, Serge
    Perona, Pietro
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (08) : 1532 - 1545
  • [8] A Novel Transformer-Based Adaptive Object Detection Method
    Su, Shuzhi
    Chen, Runbin
    Fang, Xianjin
    Zhang, Tian
    ELECTRONICS, 2023, 12 (03)
  • [9] Rethinking Transformer-based Set Prediction for Object Detection
    Sun, Zhiqing
    Cao, Shengcao
    Yang, Yiming
    Kitani, Kris
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3591 - 3600
  • [10] Transformer-Based Feature Compensation and Aggregation for DeepFake Detection
    Tan, Zichang
    Yang, Zhichao
    Miao, Changtao
    Guo, Guodong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2183 - 2187