Training data-efficient image transformers & distillation through attention

被引:0
|
作者
Touvron, Hugo [1 ,2 ]
Cord, Matthieu [1 ,2 ]
Douze, Matthijs [1 ]
Massa, Francisco [1 ]
Sablayrolles, Alexandre [1 ]
Jegou, Herve [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Sorbonne Univ, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.
引用
收藏
页码:7358 / 7367
页数:10
相关论文
共 50 条
  • [31] Data-Efficient Language Shaped Few-shot Image Classification
    Liang, Zhenwen
    Zhang, Xiangliang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4680 - 4686
  • [32] Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification
    Brigato, Lorenzo
    Barz, Bjoern
    Iocchi, Luca
    Denzler, Joachim
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1071 - 1080
  • [33] VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
    Chen, Jun
    Guo, Han
    Yi, Kai
    Li, Boyang
    Elhoseiny, Mohamed
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18009 - 18019
  • [34] Labelling with dynamics: A data-efficient learning paradigm for medical image segmentation
    Mo, Yuanhan
    Liu, Fangde
    Yang, Guang
    Wang, Shuo
    Zheng, Jianqing
    Wu, Fuping
    Papiez, Bartlomiej W.
    Mcilwraith, Douglas
    He, Taigang
    Guo, Yike
    MEDICAL IMAGE ANALYSIS, 2024, 95
  • [35] Data-Efficient Graph Learning
    Ding, Kaize
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22663 - 22663
  • [36] Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
    Cheng, Ruizhe
    Wu, Bichen
    Zhang, Peizhao
    Vajda, Peter
    Gonzalez, Joseph E.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3113 - 3118
  • [37] A self-supervised deep learning method for data-efficient training in genomics
    Guenduez, Hueseyin Anil
    Binder, Martin
    To, Xiao-Yin
    Mreches, Rene
    Bischl, Bernd
    McHardy, Alice C.
    Muench, Philipp C.
    Rezaei, Mina
    COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [38] A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies
    Bharadhwaj, Homanga
    Wang, Zihan
    Bengio, Yoshua
    Paull, Liam
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 782 - 788
  • [39] A self-supervised deep learning method for data-efficient training in genomics
    Hüseyin Anil Gündüz
    Martin Binder
    Xiao-Yin To
    René Mreches
    Bernd Bischl
    Alice C. McHardy
    Philipp C. Münch
    Mina Rezaei
    Communications Biology, 6
  • [40] A Data-Efficient Training Model for Signal Integrity Analysis based on Transfer Learning
    Zhang, Tingrui
    Chen, Siyu
    Wei, Shuwu
    Chen, Jienan
    2019 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2019), 2019, : 182 - 185