Training data-efficient image transformers & distillation through attention

被引:0
|
作者
Touvron, Hugo [1 ,2 ]
Cord, Matthieu [1 ,2 ]
Douze, Matthijs [1 ]
Massa, Francisco [1 ]
Sablayrolles, Alexandre [1 ]
Jegou, Herve [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Sorbonne Univ, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
引用
收藏
页码:7358 / 7367
页数:10
相关论文
共 50 条
  • [1] DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
    Chen, Xianing
    Cao, Qiong
    Zhong, Yujie
    Zhang, Jing
    Gao, Shenghua
    Tao, Dacheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12042 - 12052
  • [2] UAV Image Multi-Labeling with Data-Efficient Transformers
    Bashmal, Laila
    Bazi, Yakoub
    Al Rahhal, Mohamad Mahmoud
    Alhichri, Haikel
    Al Ajlan, Naif
    APPLIED SCIENCES-BASEL, 2021, 11 (09):
  • [3] Towards Data-Efficient Detection Transformers
    Wang, Wen
    Zhang, Jing
    Cao, Yang
    Shen, Yongliang
    Tao, Dacheng
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 88 - 105
  • [4] Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers
    Khader, Firas
    Kather, Jakob Nikolas
    Han, Tianyu
    Nebelung, Sven
    Kuhl, Christiane
    Stegmaier, Johannes
    Truhn, Daniel
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT II, 2024, 14349 : 417 - 426
  • [5] ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training
    Touvron, Hugo
    Bojanowski, Piotr
    Caron, Mathilde
    Cord, Matthieu
    El-Nouby, Alaaeldin
    Grave, Edouard
    Izacard, Gautier
    Joulin, Armand
    Synnaeve, Gabriel
    Verbeek, Jakob
    Jegou, Herve
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 5314 - 5321
  • [6] HDKD: Hybrid data-efficient knowledge distillation network for medical image classification
    EL-Assiouti, Omar S.
    Hamed, Ghada
    Khattab, Dina
    Ebied, Hala M.
    Engineering Applications of Artificial Intelligence, 2024, 138
  • [7] Data-Efficient Image Quality Assessment with Attention-Panel Decoder
    Qin, Guanyi
    Hu, Runze
    Liu, Yutao
    Zheng, Xiawu
    Liu, Haotian
    Li, Xiu
    Zhang, Yan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2091 - 2100
  • [8] Data-Efficient Sensor Upgrade Path Using Knowledge Distillation
    Van Molle, Pieter
    De Boom, Cedric
    Verbelen, Tim
    Vankeirsbilck, Bert
    De Vylder, Jonas
    Diricx, Bart
    Simoens, Pieter
    Dhoedt, Bart
    SENSORS, 2021, 21 (19)
  • [9] Data-Efficient Augmentation for Training Neural Networks
    Liu, Tian Yu
    Mirzasoleiman, Baharan
    Advances in Neural Information Processing Systems, 2022, 35
  • [10] Data-Efficient Augmentation for Training Neural Networks
    Liu, Tian Yu
    Mirzasoleiman, Baharan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,