PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
下载
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [1] Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
    Fang, Yuxin
    Yang, Shusheng
    Wang, Shijie
    Ge, Yixiao
    Shan, Ying
    Wang, Xinggang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6221 - 6230
  • [2] Transformer-based End-to-End Object Detection in Aerial Images
    Vo, Nguyen D.
    Le, Nguyen
    Ngo, Giang
    Doan, Du
    Le, Do
    Nguyen, Khang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1072 - 1079
  • [3] A Transformer-Based Framework for Tiny Object Detection
    Liao, Yi-Kai
    Lin, Gong-Si
    Yeh, Mei-Chen
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 373 - 377
  • [4] Survey of Transformer-Based Object Detection Algorithms
    Li, Jian
    Du, Jianqiang
    Zhu, Yanchen
    Guo, Yongkun
    Computer Engineering and Applications, 2023, 59 (10) : 48 - 64
  • [5] Transformer-Based Masked Autoencoder With Contrastive Loss for Hyperspectral Image Classification
    Cao, Xianghai
    Lin, Haifeng
    Guo, Shuaixu
    Xiong, Tao
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [6] Transformer-Based Multi-object Tracking in Unmanned Aerial Vehicles
    Li, Jiaxin
    Li, Hongjun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 347 - 358
  • [7] A Novel Transformer-Based Adaptive Object Detection Method
    Su, Shuzhi
    Chen, Runbin
    Fang, Xianjin
    Zhang, Tian
    ELECTRONICS, 2023, 12 (03)
  • [8] Rethinking Transformer-based Set Prediction for Object Detection
    Sun, Zhiqing
    Cao, Shengcao
    Yang, Yiming
    Kitani, Kris
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3591 - 3600
  • [9] ATTENTION-GUIDED CONTRASTIVE MASKED IMAGE MODELING FOR TRANSFORMER-BASED SELF-SUPERVISED LEARNING
    Zhan, Yucheng
    Zhao, Yucheng
    Luo, Chong
    Zhang, Yueyi
    Sun, Xiaoyan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2490 - 2494
  • [10] AERIAL IMAGE OBJECT DETECTION WITH VISION TRANSFORMER DETECTOR (VITDET)
    Wang, Liya
    Tien, Alex
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6450 - 6453