Training data-efficient image transformers & distillation through attention

被引：0

作者：

Touvron, Hugo ^{[1
,2
]}

Cord, Matthieu ^{[1
,2
]}

Douze, Matthijs ^{[1
]}

Massa, Francisco ^{[1
]}

Sablayrolles, Alexandre ^{[1
]}

Jegou, Herve ^{[1
]}

机构：

[1] Facebook AI, Menlo Pk, CA 94025 USA

[2] Sorbonne Univ, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. These high-performing vision transformers are pre-trained with hundreds of millions of images using a large infrastructure, thereby limiting their adoption. In this work, we produce competitive convolution-free transformers trained on ImageNet only using a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1% (single-crop) on ImageNet with no external data. We also introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention, typically from a convnet teacher. The learned transformers are competitive (85.2% top-1. acc.) with the state of the art on ImageNet, and similarly when transferred to other tasks. We will share our code and models.

引用

页码：7358 / 7367

页数：10

共 50 条

[41] Data-Efficient Design Exploration through Surrogate-Assisted Illumination
Gaier, Adam
Asteroth, Alexander
Mouret, Jean-Baptiste
EVOLUTIONARY COMPUTATION, 2018, 26 (03) : 381 - 410
[42] Data-Efficient MADDPG Based on Self-Attention for IoT Energy Management Systems
Al-Saffar, Mohammed
Gul, Mustafa
IEEE ACCESS, 2023, 11 : 109379 - 109389
[43] Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
Chen, Tianlong
Cheng, Yu
Gan, Zhe
Liu, Jingjing
Wang, Zhangyang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[44] Don't overlook any detail: Data-efficient reinforcement learning with visual attention
Ma, Jialin
Li, Ce
Feng, Zhiqiang
Xiao, Limei
He, Chengdan
Zhang, Yan
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[45] Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration
Saxena, Divya
Cao, Jiannong
Xu, Jiahao
Kulshrestha, Tarun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16230 - 16240
[46] Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity
Joshi, Siddharth
Jain, Arnav
Payani, Ali
Mirzasoleiman, Baharan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[47] Data-efficient generalization of AI transformers for noise reduction in ultra-fast lung PET scans
Wang, Jiale
Zhang, Xinyu
Miao, Ying
Xue, Song
Zhang, Yu
Shi, Kuangyu
Guo, Rui
Li, Biao
Zheng, Guoyan
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2025,
[48] Medical image segmentation data augmentation method based on channel weight and data-efficient features
Wu X.
Tao C.
Li Z.
Zhang J.
Sun Q.
Han X.
Chen Y.
Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2024, 41 (02): : 220 - 227
[49] A Survey of Data-Efficient Graph Learning
Ju, Wei
Yi, Siyu
Wang, Yifan
Long, Qingqing
Luo, Junyu
Xiao, Zhiping
Zhang, Ming
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8104 - 8113
[50] Uniform Priors for Data-Efficient Learning
Sinha, Samarth
Roth, Karsten
Goyal, Anirudh
Ghassemi, Marzyeh
Akata, Zeynep
Larochelle, Hugo
Garg, Animesh
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4026 - 4037

← 1 2 3 4 5 →