Training data-efficient image transformers & distillation through attention

被引：0

作者：

Touvron, Hugo ^{[1
,2
]}

Cord, Matthieu ^{[1
,2
]}

Douze, Matthijs ^{[1
]}

Massa, Francisco ^{[1
]}

Sablayrolles, Alexandre ^{[1
]}

Jegou, Herve ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Sorbonne Univ, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.

引用

页码：7358 / 7367

页数：10

共 50 条

[31] Data-Efficient Language Shaped Few-shot Image Classification
Liang, Zhenwen
Zhang, Xiangliang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4680 - 4686
[32] Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification
Brigato, Lorenzo
Barz, Bjoern
Iocchi, Luca
Denzler, Joachim
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1071 - 1080
[33] VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Chen, Jun
Guo, Han
Yi, Kai
Li, Boyang
Elhoseiny, Mohamed
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18009 - 18019
[34] Labelling with dynamics: A data-efficient learning paradigm for medical image segmentation
Mo, Yuanhan
Liu, Fangde
Yang, Guang
Wang, Shuo
Zheng, Jianqing
Wu, Fuping
Papiez, Bartlomiej W.
Mcilwraith, Douglas
He, Taigang
Guo, Yike
MEDICAL IMAGE ANALYSIS, 2024, 95
[35] Data-Efficient Graph Learning
Ding, Kaize
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22663 - 22663
[36] Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Cheng, Ruizhe
Wu, Bichen
Zhang, Peizhao
Vajda, Peter
Gonzalez, Joseph E.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3113 - 3118
[37] A self-supervised deep learning method for data-efficient training in genomics
Guenduez, Hueseyin Anil
Binder, Martin
To, Xiao-Yin
Mreches, Rene
Bischl, Bernd
McHardy, Alice C.
Muench, Philipp C.
Rezaei, Mina
COMMUNICATIONS BIOLOGY, 2023, 6 (01)
[38] A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies
Bharadhwaj, Homanga
Wang, Zihan
Bengio, Yoshua
Paull, Liam
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 782 - 788
[39] A self-supervised deep learning method for data-efficient training in genomics
Hüseyin Anil Gündüz
Martin Binder
Xiao-Yin To
René Mreches
Bernd Bischl
Alice C. McHardy
Philipp C. Münch
Mina Rezaei
Communications Biology, 6
[40] A Data-Efficient Training Model for Signal Integrity Analysis based on Transfer Learning
Zhang, Tingrui
Chen, Siyu
Wei, Shuwu
Chen, Jienan
2019 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2019), 2019, : 182 - 185

← 1 2 3 4 5 →