Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

被引：14

作者：

Zhang, Gongjie ^{[1
,2
]}

Luo, Zhipeng ^{[1
,3
]}

Tian, Zichen ^{[1
]}

Zhang, Jingyi ^{[1
]}

Zhang, Xiaoqin ^{[4
]}

Lu, Shijian ^{[1
]}

机构：

[1] Nanyang Technol Univ, S Lab, Singapore, Singapore

[2] Black Sesame Technol, Singapore, Singapore

[3] SenseTime Res, Hong Kong, Peoples R China

[4] Wenzhou Univ, Wenzhou, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00601

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.

引用

页码：6206 / 6216

页数：11

共 50 条

[21] Research on Multi-Scale CNN and Transformer-Based Multi-Level Multi-Classification Method for Images
Gou, Quandeng
Ren, Yuheng
IEEE ACCESS, 2024, 12 : 103049 - 103059
[22] TFNet: Transformer-Based Multi-Scale Feature Fusion Forest Fire Image Detection Network
Liu, Hongying
Zhang, Fuquan
Xu, Yiqing
Wang, Junling
Lu, Hong
Wei, Wei
Zhu, Jun
FIRE-SWITZERLAND, 2025, 8 (02):
[23] TMA-Net: A Transformer-Based Multi-Scale Attention Network for Surgical Instrument Segmentation
Yang, Lei
Wang, Hongyong
Gu, Yuge
Bian, Guibin
Liu, Yanhong
Yu, Hongnian
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2023, 5 (02): : 323 - 334
[24] PlaceFormer: Transformer-Based Visual Place Recognition Using Multi-Scale Patch Selection and Fusion
Kannan, Shyam Sundar
Min, Byung-Cheol
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6552 - 6559
[25] A Novel Multi-Scale Transformer for Object Detection in Aerial Scenes
Lu, Guanlin
He, Xiaohui
Wang, Qiang
Shao, Faming
Wang, Hongwei
Wang, Jinkang
DRONES, 2022, 6 (08)
[26] An efficient multi-scale transformer for satellite image dehazing
Yang, Lei
Cao, Jianzhong
Chen, Weining
Wang, Hao
He, Lang
EXPERT SYSTEMS, 2024, 41 (08)
[27] FPDT: a multi-scale feature pyramidal object detection transformer
Huang, Kailai
Wen, Mi
Wang, Chen
Ling, Lina
JOURNAL OF APPLIED REMOTE SENSING, 2023, 17 (02)
[28] UniTracker: transformer-based CrossUnihead for multi-object tracking
Wu, Fan
Zhang, Yifeng
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (04)
[29] MULTI-SCALE SHARED FEATURES FOR CASCADE OBJECT DETECTION
Lin, Zhe
Hua, Gang
Davis, Larry S.
2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1865 - 1868
[30] Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification
Cheng, Gong
Gao, Decheng
Liu, Yang
Han, Junwei
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 649 - 655

← 1 2 3 4 5 →