Training data-efficient image transformers & distillation through attention

被引：0

作者：

Touvron, Hugo ^{[1
,2
]}

Cord, Matthieu ^{[1
,2
]}

Douze, Matthijs ^{[1
]}

Massa, Francisco ^{[1
]}

Sablayrolles, Alexandre ^{[1
]}

Jegou, Herve ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Sorbonne Univ, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.

引用

页码：7358 / 7367

页数：10

共 50 条

[11] Data-Efficient Augmentation for Training Neural Networks
Liu, Tian Yu
Mirzasoleiman, Baharan
Advances in Neural Information Processing Systems, 2022, 35
[12] Data-Efficient Augmentation for Training Neural Networks
Liu, Tian Yu
Mirzasoleiman, Baharan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[13] Data-Efficient Training Strategies for Neural TTS Systems
Prajwal, K. R.
Jawahar, C., V
CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 223 - 227
[14] Sobolev Training for Data-efficient Approximate Nonlinear MPC
Lueken, Lukas
Brandner, Dean
Lucia, Sergio
IFAC PAPERSONLINE, 2023, 56 (02): : 5765 - 5772
[15] A Data-Efficient Training Method for Deep Reinforcement Learning
Feng, Wenhui
Han, Chongzhao
Lian, Feng
Liu, Xia
ELECTRONICS, 2022, 11 (24)
[16] Sparse Winning Tickets are Data-Efficient Image Recognizers
Varma, Mukund T.
Chen, Xuxi
Zhang, Zhenyu
Chen, Tianlong
Venugopalan, Subhashini
Wang, Zhangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[17] Data-Efficient Image Recognition with Contrastive Predictive Coding
Henaff, Olivier J.
Srinivas, Aravind
De Fauw, Jeffrey
Razavi, Ali
Doersch, Carl
Eslami, S. M. Ali
van den Oord, Aaron
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[18] GRTr: Generative-Retrieval Transformers for Data-Efficient Dialogue Domain Adaptation
Shalyminov, Igor
Sordoni, Alessandro
Atkinson, Adam
Schulz, Hannes
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2484 - 2492
[19] Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology
S. P. Kyathanahally
T. Hardeman
M. Reyes
E. Merz
T. Bulas
P. Brun
F. Pomati
M. Baity-Jesi
Scientific Reports, 12
[20] Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology
Kyathanahally, S. P.
Hardeman, T.
Reyes, M.
Merz, E.
Bulas, T.
Brun, P.
Pomati, F.
Baity-Jesi, M.
SCIENTIFIC REPORTS, 2022, 12 (01)

← 1 2 3 4 5 →