Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引:14
|
作者
Zhang, Gongjie [1 ,2 ]
Luo, Zhipeng [1 ,3 ]
Tian, Zichen [1 ]
Zhang, Jingyi [1 ]
Zhang, Xiaoqin [4 ]
Lu, Shijian [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Black Sesame Technol, Singapore, Singapore
[3] SenseTime Res, Hong Kong, Peoples R China
[4] Wenzhou Univ, Wenzhou, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00601
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.
引用
收藏
页码:6206 / 6216
页数:11
相关论文
共 50 条
  • [21] Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images
    Gou, Quandeng
    Ren, Yuheng
    IEEE ACCESS, 2024, 12 : 103049 - 103059
  • [22] TFNet: Transformer-Based Multi-Scale Feature Fusion Forest Fire Image Detection Network
    Liu, Hongying
    Zhang, Fuquan
    Xu, Yiqing
    Wang, Junling
    Lu, Hong
    Wei, Wei
    Zhu, Jun
    FIRE-SWITZERLAND, 2025, 8 (02):
  • [23] TMA-Net: A Transformer-Based Multi-Scale Attention Network for Surgical Instrument Segmentation
    Yang, Lei
    Wang, Hongyong
    Gu, Yuge
    Bian, Guibin
    Liu, Yanhong
    Yu, Hongnian
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2023, 5 (02): : 323 - 334
  • [24] PlaceFormer: Transformer-Based Visual Place Recognition Using Multi-Scale Patch Selection and Fusion
    Kannan, Shyam Sundar
    Min, Byung-Cheol
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6552 - 6559
  • [25] A Novel Multi-Scale Transformer for Object Detection in Aerial Scenes
    Lu, Guanlin
    He, Xiaohui
    Wang, Qiang
    Shao, Faming
    Wang, Hongwei
    Wang, Jinkang
    DRONES, 2022, 6 (08)
  • [26] An efficient multi-scale transformer for satellite image dehazing
    Yang, Lei
    Cao, Jianzhong
    Chen, Weining
    Wang, Hao
    He, Lang
    EXPERT SYSTEMS, 2024, 41 (08)
  • [27] FPDT: a multi-scale feature pyramidal object detection transformer
    Huang, Kailai
    Wen, Mi
    Wang, Chen
    Ling, Lina
    JOURNAL OF APPLIED REMOTE SENSING, 2023, 17 (02)
  • [28] UniTracker: transformer-based CrossUnihead for multi-object tracking
    Wu, Fan
    Zhang, Yifeng
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (04)
  • [29] MULTI-SCALE SHARED FEATURES FOR CASCADE OBJECT DETECTION
    Lin, Zhe
    Hua, Gang
    Davis, Larry S.
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1865 - 1868
  • [30] Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification
    Cheng, Gong
    Gao, Decheng
    Liu, Yang
    Han, Junwei
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 649 - 655