Training data-efficient image transformers & distillation through attention

被引:0
|
作者
Touvron, Hugo [1 ,2 ]
Cord, Matthieu [1 ,2 ]
Douze, Matthijs [1 ]
Massa, Francisco [1 ]
Sablayrolles, Alexandre [1 ]
Jegou, Herve [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Sorbonne Univ, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
引用
收藏
页码:7358 / 7367
页数:10
相关论文
共 50 条
  • [11] Data-Efficient Augmentation for Training Neural Networks
    Liu, Tian Yu
    Mirzasoleiman, Baharan
    Advances in Neural Information Processing Systems, 2022, 35
  • [12] Data-Efficient Augmentation for Training Neural Networks
    Liu, Tian Yu
    Mirzasoleiman, Baharan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [13] Data-Efficient Training Strategies for Neural TTS Systems
    Prajwal, K. R.
    Jawahar, C., V
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 223 - 227
  • [14] Sobolev Training for Data-efficient Approximate Nonlinear MPC
    Lueken, Lukas
    Brandner, Dean
    Lucia, Sergio
    IFAC PAPERSONLINE, 2023, 56 (02): : 5765 - 5772
  • [15] A Data-Efficient Training Method for Deep Reinforcement Learning
    Feng, Wenhui
    Han, Chongzhao
    Lian, Feng
    Liu, Xia
    ELECTRONICS, 2022, 11 (24)
  • [16] Sparse Winning Tickets are Data-Efficient Image Recognizers
    Varma, Mukund T.
    Chen, Xuxi
    Zhang, Zhenyu
    Chen, Tianlong
    Venugopalan, Subhashini
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [17] Data-Efficient Image Recognition with Contrastive Predictive Coding
    Henaff, Olivier J.
    Srinivas, Aravind
    De Fauw, Jeffrey
    Razavi, Ali
    Doersch, Carl
    Eslami, S. M. Ali
    van den Oord, Aaron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [18] GRTr: Generative-Retrieval Transformers for Data-Efficient Dialogue Domain Adaptation
    Shalyminov, Igor
    Sordoni, Alessandro
    Atkinson, Adam
    Schulz, Hannes
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2484 - 2492
  • [19] Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology
    S. P. Kyathanahally
    T. Hardeman
    M. Reyes
    E. Merz
    T. Bulas
    P. Brun
    F. Pomati
    M. Baity-Jesi
    Scientific Reports, 12
  • [20] Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology
    Kyathanahally, S. P.
    Hardeman, T.
    Reyes, M.
    Merz, E.
    Bulas, T.
    Brun, P.
    Pomati, F.
    Baity-Jesi, M.
    SCIENTIFIC REPORTS, 2022, 12 (01)