Training data-efficient image transformers & distillation through attention

被引:0
|
作者
Touvron, Hugo [1 ,2 ]
Cord, Matthieu [1 ,2 ]
Douze, Matthijs [1 ]
Massa, Francisco [1 ]
Sablayrolles, Alexandre [1 ]
Jegou, Herve [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Sorbonne Univ, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
引用
收藏
页码:7358 / 7367
页数:10
相关论文
共 50 条
  • [41] Data-Efficient Design Exploration through Surrogate-Assisted Illumination
    Gaier, Adam
    Asteroth, Alexander
    Mouret, Jean-Baptiste
    EVOLUTIONARY COMPUTATION, 2018, 26 (03) : 381 - 410
  • [42] Data-Efficient MADDPG Based on Self-Attention for IoT Energy Management Systems
    Al-Saffar, Mohammed
    Gul, Mustafa
    IEEE ACCESS, 2023, 11 : 109379 - 109389
  • [43] Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
    Chen, Tianlong
    Cheng, Yu
    Gan, Zhe
    Liu, Jingjing
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [44] Don't overlook any detail: Data-efficient reinforcement learning with visual attention
    Ma, Jialin
    Li, Ce
    Feng, Zhiqiang
    Xiao, Limei
    He, Chengdan
    Zhang, Yan
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [45] Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration
    Saxena, Divya
    Cao, Jiannong
    Xu, Jiahao
    Kulshrestha, Tarun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16230 - 16240
  • [46] Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
    Joshi, Siddharth
    Jain, Arnav
    Payani, Ali
    Mirzasoleiman, Baharan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [47] Data-efficient generalization of AI transformers for noise reduction in ultra-fast lung PET scans
    Wang, Jiale
    Zhang, Xinyu
    Miao, Ying
    Xue, Song
    Zhang, Yu
    Shi, Kuangyu
    Guo, Rui
    Li, Biao
    Zheng, Guoyan
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2025,
  • [48] Medical image segmentation data augmentation method based on channel weight and data-efficient features
    Wu X.
    Tao C.
    Li Z.
    Zhang J.
    Sun Q.
    Han X.
    Chen Y.
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2024, 41 (02): : 220 - 227
  • [49] A Survey of Data-Efficient Graph Learning
    Ju, Wei
    Yi, Siyu
    Wang, Yifan
    Long, Qingqing
    Luo, Junyu
    Xiao, Zhiping
    Zhang, Ming
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8104 - 8113
  • [50] Uniform Priors for Data-Efficient Learning
    Sinha, Samarth
    Roth, Karsten
    Goyal, Anirudh
    Ghassemi, Marzyeh
    Akata, Zeynep
    Larochelle, Hugo
    Garg, Animesh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4026 - 4037