PYRAMID MASKED IMAGE MODELING FOR TRANSFORMER-BASED AERIAL OBJECT DETECTION

被引:2
|
作者
Zhang, Cong [1 ]
Liu, Tianshan [1 ]
Ju, Yakun [1 ]
Lam, Kin-Man [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
关键词
Vision Transformer; Masked Image Modeling; Self-Supervised Learning; Pyramid Architecture; Aerial Object Detection;
D O I
10.1109/ICIP49359.2023.10223093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
引用
下载
收藏
页码:1675 / 1679
页数:5
相关论文
共 50 条
  • [21] Transformer-Based Feature Compensation Network for Aerial Photography Person and Ground Object Recognition
    Zhang, Guoqing
    Zheng, Chen
    Ye, Zhonglin
    REMOTE SENSING, 2024, 16 (02)
  • [22] Causal and Masked Language Modeling of Java']Javanese Language using Transformer-based Architectures
    Wongso, Wilson
    Setiawan, David Samuel
    Suhartono, Derwin
    13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021), 2021, : 29 - 35
  • [23] Transformer-based contrastive learning framework for image anomaly detection
    Fan, Wentao
    Shangguan, Weimin
    Chen, Yewang
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (10) : 3413 - 3426
  • [24] HTDet: A Hybrid Transformer-Based Approach for Underwater Small Object Detection
    Chen, Gangqi
    Mao, Zhaoyong
    Wang, Kai
    Shen, Junge
    REMOTE SENSING, 2023, 15 (04)
  • [25] Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection
    Chen, Zhe
    Zhang, Jing
    Xu, Yufei
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (10) : 2738 - 2756
  • [26] Transformer-based Cross Reference Network for video salient object detection
    Huang, Kan
    Tian, Chunwei
    Su, Jingyong
    Lin, Jerry Chun-Wei
    PATTERN RECOGNITION LETTERS, 2022, 160 : 122 - 127
  • [27] Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection
    Zhe Chen
    Jing Zhang
    Yufei Xu
    Dacheng Tao
    International Journal of Computer Vision, 2023, 131 : 2738 - 2756
  • [28] Transformer-based contrastive learning framework for image anomaly detection
    Wentao Fan
    Weimin Shangguan
    Yewang Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3413 - 3426
  • [29] Compositional Learning in Transformer-Based Human-Object Interaction Detection
    Zhuang, Zikun
    Qian, Ruihao
    Xie, Chi
    Liang, Shuang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1038 - 1043
  • [30] Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation
    Cui, Yiming
    Yang, Linjie
    Yu, Haichao
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202