Training data-efficient image transformers & distillation through attention

被引：0

作者：

Touvron, Hugo ^{[1
,2
]}

Cord, Matthieu ^{[1
,2
]}

Douze, Matthijs ^{[1
]}

Massa, Francisco ^{[1
]}

Sablayrolles, Alexandre ^{[1
]}

Jegou, Herve ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Sorbonne Univ, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.

引用

页码：7358 / 7367

页数：10

共 50 条

[21] FEDERATED SELF-TRAINING FOR DATA-EFFICIENT AUDIO RECOGNITION
Tsouvalas, Vasileios
Saeed, Aaqib
Ozcelebi, Tanir
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 476 - 480
[22] DEA: Data-efficient augmentation for interpretable medical image segmentation
Wu, Xing
Li, Zhi
Tao, Chenjie
Han, Xianhua
Chen, Yen-Wei
Yao, Junfeng
Zhang, Jian
Sun, Qun
Li, Weimin
Liu, Yue
Guo, Yike
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
[23] Data-Efficient Histopathology Image Analysis with Deformation Representation Learning
Xu, Jilan
Hou, Junlin
Zhang, Yuejie
Feng, Rui
Ruan, Chunyang
Zhang, Tao
Fan, Weiguo
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 857 - 864
[24] Data-Efficient Generation of Protein Conformational Ensembles with Backbone-to-Side-Chain Transformers
Chennakesavalu, Shriram
Rotskoff, Grant M.
JOURNAL OF PHYSICAL CHEMISTRY B, 2024, 128 (09): : 2114 - 2123
[25] Data-Efficient Knowledge Distillation with Teacher Assistant-Based Dynamic Objective Alignment
Xu, Yangyan
Cao, Cong
Yuan, Fangfang
Mi, Rongxin
Wang, Dakui
Liu, Yanbing
Su, Majing
COMPUTATIONAL SCIENCE, ICCS 2024, PT I, 2024, 14832 : 181 - 195
[26] Data-efficient image captioning of fine art paintings via virtual-real semantic alignment training
Lu, Yue
Guo, Chao
Dai, Xingyuan
Wang, Fei-Yue
NEUROCOMPUTING, 2022, 490 : 163 - 180
[27] Data-Efficient Policy Evaluation Through Behavior Policy Search
Hanna, Josiah P.
Chandak, Yash
Thomas, Philip S.
White, Martha
Stone, Peter
Niekum, Scott
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 58
[28] Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
Wang, Zhendong
Jiang, Yifan
Zheng, Huangjie
Wang, Peihao
He, Pengcheng
Wang, Zhangyang
Chen, Weizhu
Zhou, Mingyuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[29] Data-Efficient Policy Evaluation Through Behavior Policy Search
Hanna, Josiah P.
Thomas, Philip S.
Stone, Peter
Niekum, Scott
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[30] Author Correction: Ensembles of data-efficient vision transformers as a new paradigm for automated classification in ecology
S. P. Kyathanahally
T. Hardeman
M. Reyes
E. Merz
T. Bulas
P. Brun
F. Pomati
M. Baity-Jesi
Scientific Reports, 13

← 1 2 3 4 5 →