Rotary Position Embedding for Vision Transformer

被引:2
|
作者
Heo, Byeongho [1 ]
Park, Song [1 ]
Han, Dongyoon [1 ]
Yun, Sangdoo [1 ]
机构
[1] NAVER AI Lab, Seongnam, South Korea
来源
关键词
D O I
10.1007/978-3-031-72684-2_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to ViTs, utilizing practical implementations of RoPE for 2D vision data. The analysis reveals that RoPE demonstrates impressive extrapolation performance, i.e., maintaining precision while increasing image resolution at inference. It eventually leads to performance improvement for ImageNet-1k, COCO detection, and ADE-20k segmentation. We believe this study provides thorough guidelines to apply RoPE into ViT, promising improved backbone performance with minimal extra computational overhead. Our code and pre-trained models are available at https://github.com/naver-ai/rope-vit
引用
收藏
页码:289 / 305
页数:17
相关论文
共 50 条
  • [31] Design and testing of a novel rotary transformer for rotary ultrasonic machining
    Duan, Jiyue
    Lin, Bin
    Yang, Qiang
    Luan, Yujia
    IEICE ELECTRONICS EXPRESS, 2017, 14 (23):
  • [32] MASK-VIT: AN OBJECT MASK EMBEDDING IN VISION TRANSFORMER FOR FINE-GRAINED VISUAL CLASSIFICATION
    Su, Tong
    Ye, Shuo
    Song, Chengqun
    Cheng, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1626 - 1630
  • [33] Vision Transformer for Pansharpening
    Meng, Xiangchao
    Wang, Nan
    Shao, Feng
    Li, Shutao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [34] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [35] Peripheral Vision Transformer
    Min, Juhong
    Zhao, Yucheng
    Luo, Chong
    Cho, Minsu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] Super Vision Transformer
    Lin, Mingbao
    Chen, Mengzhao
    Zhang, Yuxin
    Shen, Chunhua
    Ji, Rongrong
    Cao, Liujuan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (12) : 3136 - 3151
  • [37] Dual Vision Transformer
    Yao, Ting
    Li, Yehao
    Pan, Yingwei
    Wang, Yu
    Zhang, Xiao-Ping
    Mei, Tao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10870 - 10882
  • [38] Super Vision Transformer
    Mingbao Lin
    Mengzhao Chen
    Yuxin Zhang
    Chunhua Shen
    Rongrong Ji
    Liujuan Cao
    International Journal of Computer Vision, 2023, 131 : 3136 - 3151
  • [39] Vicinity Vision Transformer
    Sun W.
    Qin Z.
    Deng H.
    Wang J.
    Zhang Y.
    Zhang K.
    Barnes N.
    Birchfield S.
    Kong L.
    Zhong Y.
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45 (10) : 12635 - 12649
  • [40] DropKey for Vision Transformer
    Li, Bonan
    Hu, Yinhan
    Nie, Xuecheng
    Han, Congying
    Jiang, Xiangjian
    Guo, Tiande
    Liu, Luocji
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22700 - 22709