Training data-efficient image transformers & distillation through attention

被引:0
|
作者
Touvron, Hugo [1 ,2 ]
Cord, Matthieu [1 ,2 ]
Douze, Matthijs [1 ]
Massa, Francisco [1 ]
Sablayrolles, Alexandre [1 ]
Jegou, Herve [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Sorbonne Univ, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
引用
收藏
页码:7358 / 7367
页数:10
相关论文
共 50 条
  • [21] FEDERATED SELF-TRAINING FOR DATA-EFFICIENT AUDIO RECOGNITION
    Tsouvalas, Vasileios
    Saeed, Aaqib
    Ozcelebi, Tanir
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 476 - 480
  • [22] DEA: Data-efficient augmentation for interpretable medical image segmentation
    Wu, Xing
    Li, Zhi
    Tao, Chenjie
    Han, Xianhua
    Chen, Yen-Wei
    Yao, Junfeng
    Zhang, Jian
    Sun, Qun
    Li, Weimin
    Liu, Yue
    Guo, Yike
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
  • [23] Data-Efficient Histopathology Image Analysis with Deformation Representation Learning
    Xu, Jilan
    Hou, Junlin
    Zhang, Yuejie
    Feng, Rui
    Ruan, Chunyang
    Zhang, Tao
    Fan, Weiguo
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 857 - 864
  • [24] Data-Efficient Generation of Protein Conformational Ensembles with Backbone-to-Side-Chain Transformers
    Chennakesavalu, Shriram
    Rotskoff, Grant M.
    JOURNAL OF PHYSICAL CHEMISTRY B, 2024, 128 (09): : 2114 - 2123
  • [25] Data-Efficient Knowledge Distillation with Teacher Assistant-Based Dynamic Objective Alignment
    Xu, Yangyan
    Cao, Cong
    Yuan, Fangfang
    Mi, Rongxin
    Wang, Dakui
    Liu, Yanbing
    Su, Majing
    COMPUTATIONAL SCIENCE, ICCS 2024, PT I, 2024, 14832 : 181 - 195
  • [26] Data-efficient image captioning of fine art paintings via virtual-real semantic alignment training
    Lu, Yue
    Guo, Chao
    Dai, Xingyuan
    Wang, Fei-Yue
    NEUROCOMPUTING, 2022, 490 : 163 - 180
  • [27] Data-Efficient Policy Evaluation Through Behavior Policy Search
    Hanna, Josiah P.
    Chandak, Yash
    Thomas, Philip S.
    White, Martha
    Stone, Peter
    Niekum, Scott
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 58
  • [28] Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
    Wang, Zhendong
    Jiang, Yifan
    Zheng, Huangjie
    Wang, Peihao
    He, Pengcheng
    Wang, Zhangyang
    Chen, Weizhu
    Zhou, Mingyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] Data-Efficient Policy Evaluation Through Behavior Policy Search
    Hanna, Josiah P.
    Thomas, Philip S.
    Stone, Peter
    Niekum, Scott
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [30] Author Correction: Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology
    S. P. Kyathanahally
    T. Hardeman
    M. Reyes
    E. Merz
    T. Bulas
    P. Brun
    F. Pomati
    M. Baity-Jesi
    Scientific Reports, 13