PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
下载
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [31] Transformer-based few-shot object detection in traffic scenarios
    Erjun Sun
    Di Zhou
    Yan Tian
    Zhaocheng Xu
    Xun Wang
    Applied Intelligence, 2024, 54 : 947 - 958
  • [32] Transformer-based few-shot object detection in traffic scenarios
    Sun, Erjun
    Zhou, Di
    Tian, Yan
    Xu, Zhaocheng
    Wang, Xun
    APPLIED INTELLIGENCE, 2024, 54 (01) : 947 - 958
  • [33] Dual Attention Based Image Pyramid Network for Object Detection
    Dong, Xiang
    Li, Feng
    Bai, Huihui
    Zhao, Yao
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (12): : 4439 - 4455
  • [34] Transformer-Based Object Detection with Deep Feature Fusion Using Carafe Operator (TRCNet) in Remote Sensing Image
    Chen S.
    Wang B.
    Zhong C.
    EAI Endorsed Transactions on Energy Web, 2023, 10 : 1 - 11
  • [35] Scale Decoupled Pyramid for Object Detection in Aerial Images
    Ma, You
    Chai, Lin
    Jin, Lizuo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [36] BOUNDARY-AWARE BIAS LOSS FOR TRANSFORMER-BASED AERIAL IMAGE SEGMENTATION MODEL
    Zhang, Yan
    Jiang, Xue
    Liu, Siqi
    Hu, Bo
    Gao, Xinbo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3528 - 3532
  • [37] Quantifying the Bias of Transformer-Based Language Models for African American English in Masked Language Modeling
    Salutari, Flavia
    Ramos, Jerome
    Rahmani, Hossein A.
    Linguaglossa, Leonardo
    Lipani, Aldo
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 532 - 543
  • [38] A Transformer-Based Network for Hyperspectral Object Tracking
    Gao, Long
    Chen, Langkun
    Liu, Pan
    Jiang, Yan
    Xie, Weiying
    Li, Yunsong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [39] TRANSFORMER-BASED SAR IMAGE DESPECKLING
    Perera, Malsha V.
    Bandara, Wele Gedara Chaminda
    Valanarasu, Jeya Maria Jose
    Patel, Vishal M.
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 751 - 754
  • [40] TransGOP: Transformer-Based Gaze Object Prediction
    Wang, Binglu
    Guo, Chenxi
    Jin, Yang
    Xia, Haisheng
    Liu, Nian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10180 - 10188