Training data-efficient image transformers & distillation through attention

被引：0

作者：

Touvron, Hugo ^{[1
,2
]}

Cord, Matthieu ^{[1
,2
]}

Douze, Matthijs ^{[1
]}

Massa, Francisco ^{[1
]}

Sablayrolles, Alexandre ^{[1
]}

Jegou, Herve ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Sorbonne Univ, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.

引用

页码：7358 / 7367

页数：10

共 50 条

[1] DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Chen, Xianing
Cao, Qiong
Zhong, Yujie
Zhang, Jing
Gao, Shenghua
Tao, Dacheng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12042 - 12052
[2] UAV Image Multi-Labeling with Data-Efficient Transformers
Bashmal, Laila
Bazi, Yakoub
Al Rahhal, Mohamad Mahmoud
Alhichri, Haikel
Al Ajlan, Naif
APPLIED SCIENCES-BASEL, 2021, 11 (09):
[3] Towards Data-Efficient Detection Transformers
Wang, Wen
Zhang, Jing
Cao, Yang
Shen, Yongliang
Tao, Dacheng
COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 88 - 105
[4] Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers
Khader, Firas
Kather, Jakob Nikolas
Han, Tianyu
Nebelung, Sven
Kuhl, Christiane
Stegmaier, Johannes
Truhn, Daniel
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT II, 2024, 14349 : 417 - 426
[5] ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training
Touvron, Hugo
Bojanowski, Piotr
Caron, Mathilde
Cord, Matthieu
El-Nouby, Alaaeldin
Grave, Edouard
Izacard, Gautier
Joulin, Armand
Synnaeve, Gabriel
Verbeek, Jakob
Jegou, Herve
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 5314 - 5321
[6] HDKD: Hybrid data-efficient knowledge distillation network for medical image classification
EL-Assiouti, Omar S.
Hamed, Ghada
Khattab, Dina
Ebied, Hala M.
Engineering Applications of Artificial Intelligence, 2024, 138
[7] Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Qin, Guanyi
Hu, Runze
Liu, Yutao
Zheng, Xiawu
Liu, Haotian
Li, Xiu
Zhang, Yan
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2091 - 2100
[8] Data-Efficient Sensor Upgrade Path Using Knowledge Distillation
Van Molle, Pieter
De Boom, Cedric
Verbelen, Tim
Vankeirsbilck, Bert
De Vylder, Jonas
Diricx, Bart
Simoens, Pieter
Dhoedt, Bart
SENSORS, 2021, 21 (19)
[9] Data-Efficient Augmentation for Training Neural Networks
Liu, Tian Yu
Mirzasoleiman, Baharan
Advances in Neural Information Processing Systems, 2022, 35
[10] Data-Efficient Augmentation for Training Neural Networks
Liu, Tian Yu
Mirzasoleiman, Baharan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →