Optimizing Deeper Transformers on Small Datasets

被引:0
|
作者
Xu, Peng [1 ]
Kumar, Dhruv [1 ,2 ]
Yang, Wei [1 ]
Zi, Wenjie [1 ]
Tang, Keyi [1 ]
Huang, Chenyang [1 ,5 ]
Cheung, Jackie Chi Kit [1 ,3 ,4 ]
Prince, Simon J. D. [1 ]
Cao, Yanshuai [1 ]
机构
[1] Borealis AI, Toronto, ON, Canada
[2] Univ Waterloo, Waterloo, ON, Canada
[3] McGill Univ, Montreal, PQ, Canada
[4] Mila, Canada CIFAR Chair, Montreal, PQ, Canada
[5] Univ Alberta, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually use shallow and simple additional layers on top of pre-trained models during finetuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Textto-SQL semantic parsing and logical reading comprehension. In particular, we successfully train 48 layers of transformers, comprising 24 fine-tuned layers from pre-trained RoBERTa and 24 relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state-of-the-art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider1. We achieve this by deriving a novel Data-dependent Transformer Fixedupdate initialization scheme (DT-Fixup), inspired by the prior T-Fixup work (Huang et al., 2020). Further error analysis shows that increasing depth can help improve generalization on small datasets for hard cases that require reasoning and structural understanding.
引用
收藏
页码:2089 / 2102
页数:14
相关论文
共 50 条
  • [1] Transformers Meet Small Datasets
    Shao, Ran
    Bi, Xiao-Jun
    [J]. IEEE ACCESS, 2022, 10 : 118454 - 118464
  • [2] Efficient Training of Visual Transformers with Small Datasets
    Liu, Yahui
    Sangineto, Enver
    Bi, Wei
    Sebe, Nicu
    Lepri, Bruno
    De Nadai, Marco
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Accumulated Trivial Attention Matters in Vision Transformers on Small Datasets
    Chen, Xiangyu
    Hu, Qinghao
    Li, Kaidong
    Zhong, Cuncong
    Wang, Guanghui
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3973 - 3981
  • [4] Going deeper with Image Transformers
    Touvron, Hugo
    Cord, Matthieu
    Sablayrolles, Alexandre
    Synnaeve, Gabriel
    Jegou, Herve
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 32 - 42
  • [5] AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets
    Du, Siyi
    Bayasi, Nourhan
    Hamarneh, Ghassan
    Garbi, Rafeef
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 25 - 36
  • [6] Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small Datasets
    de Lima, Leandro M.
    Krohling, Renato A.
    [J]. INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 282 - 296
  • [7] Vision Transformers for Small Histological Datasets Learned Through Knowledge Distillation
    Kanwal, Neel
    Eftestol, Trygve
    Khoraminia, Farbod
    Zuiverloon, Tahlita C. M.
    Engan, Kjersti
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT III, 2023, 13937 : 167 - 179
  • [8] Optimizing CNN Hyperparameters for Blastocyst Quality Assessment in Small Datasets
    Irmawati
    Chai, Rifai
    Basari
    Gunawan, Dadang
    [J]. IEEE ACCESS, 2022, 10 : 88621 - 88631
  • [9] Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
    Lu, Zhiying
    Xie, Hongtao
    Liu, Chuanbin
    Zhang, Yongdong
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Deeper and deeper into the pediatric small bowel
    Steiner, Steven J.
    [J]. GASTROINTESTINAL ENDOSCOPY, 2012, 75 (01) : 95 - 97