Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

被引:0
|
作者
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
机构
[1] The University of Sydney,Faculty of Engineering, School of Computer Science
来源
关键词
Object detection; Feature pyramid; Context modeling; 35A01; 65L10; 65L12; 65L20; 65L70;
D O I
暂无
中图分类号
学科分类号
摘要
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance. However, they usually require heavy cross-level connections or iterative refinement to obtain better MFF result, making them complicated in structure and inefficient in computation. To address these issues, we propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results while reducing the computational costs effectively. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency. The two representations include a locally concentrated representation and a globally summarized representation, where the former focuses on extracting context cues from nearby areas while the latter extracts general contextual representations of the whole image scene as global context cues. By collecting the condensed contexts, we employ a Transformer decoder to investigate the relations between them and each local feature from the FP and then refine the MFF results accordingly. As a result, we obtain a simple and light-weight Transformer-based Context Condensation (TCC) module, which can boost various FPs and lower their computational costs simultaneously. Extensive experimental results on the challenging MS COCO dataset show that TCC is compatible to four representative FPs and consistently improves their detection accuracy by up to 7.8% in terms of average precision and reduce their complexities by up to around 20% in terms of GFLOPs, helping them achieve state-of-the-art performance more efficiently. Code will be released at https://github.com/zhechen/TCC.
引用
收藏
页码:2738 / 2756
页数:18
相关论文
共 50 条
  • [21] HTDet: A Hybrid Transformer-Based Approach for Underwater Small Object Detection
    Chen, Gangqi
    Mao, Zhaoyong
    Wang, Kai
    Shen, Junge
    REMOTE SENSING, 2023, 15 (04)
  • [22] Transformer-based Cross Reference Network for video salient object detection
    Huang, Kan
    Tian, Chunwei
    Su, Jingyong
    Lin, Jerry Chun-Wei
    PATTERN RECOGNITION LETTERS, 2022, 160 : 122 - 127
  • [23] Compositional Learning in Transformer-Based Human-Object Interaction Detection
    Zhuang, Zikun
    Qian, Ruihao
    Xie, Chi
    Liang, Shuang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1038 - 1043
  • [24] Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
    Cui, Yiming
    Yang, Linjie
    Yu, Haichao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [25] PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION
    Zhang, Cong
    Liu, Tianshan
    Ju, Yakun
    Lam, Kin-Man
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1675 - 1679
  • [26] Transformer-based End-to-End Object Detection in Aerial Images
    Vo, Nguyen D.
    Le, Nguyen
    Ngo, Giang
    Doan, Du
    Le, Do
    Nguyen, Khang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
  • [27] Transformer-based few-shot object detection in traffic scenarios
    Erjun Sun
    Di Zhou
    Yan Tian
    Zhaocheng Xu
    Xun Wang
    Applied Intelligence, 2024, 54 : 947 - 958
  • [28] Transformer-based few-shot object detection in traffic scenarios
    Sun, Erjun
    Zhou, Di
    Tian, Yan
    Xu, Zhaocheng
    Wang, Xun
    APPLIED INTELLIGENCE, 2024, 54 (01) : 947 - 958
  • [29] A Transformer-Based Network With Feature Complementary Fusion for Crack Defect Detection
    Ma, Mingyang
    Yang, Lei
    Liu, Yanhong
    Yu, Hongnian
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 16989 - 17006
  • [30] Transformer-Based Feature Compensation Network for Aerial Photography Person and Ground Object Recognition
    Zhang, Guoqing
    Zheng, Chen
    Ye, Zhonglin
    REMOTE SENSING, 2024, 16 (02)