Transformers Meet Small Datasets

被引:5
|
作者
Shao, Ran [1 ,2 ]
Bi, Xiao-Jun [3 ]
机构
[1] Harbin Engn Univ, Coll Informat & Commun Engn, Harbin 150001, Peoples R China
[2] Harbin Vocat & Tech Coll, Coll Elect & Informat Engn, Harbin 150001, Peoples R China
[3] Minzu Univ China, Dept Informat Engn, Beijing 100081, Peoples R China
关键词
Convolutional neural networks; small datasets; transformer; vision transformer;
D O I
10.1109/ACCESS.2022.3221138
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The research and application areas of transformers have been extensively enlarged due to the success of vision transformers (ViTs). However, due to the lack of local content acquisition capabilities, the pure transformer architectures cannot be trained directly on small datasets. In this work, we first propose a new hybrid model by combining the transformer and convolution neural network (CNN). The proposed model improves the classification ability on small datasets. This is accomplished by introducing more convolution operations in the transformer's two core sections: 1) Instead of the original multi-head attention mechanism, we design a convolutional parameter sharing multi-head attention (CPSA) block that incorporates the convolutional parameter sharing projection in the attention mechanism; 2) the feed-forward network in each transformer encoder block is replaced with a local feed-forward network (LFFN) block that introduces a sandglass block with more depth-wise convolutions to provide more locality to the transformers. We achieve state-of-the-art results when training from scratch on 4 small datasets as compared with the transformers and CNNs without extensive computing resources and auxiliary training. The proposed strategy opens up new paths for the application of transformers on small datasets.
引用
下载
收藏
页码:118454 / 118464
页数:11
相关论文
共 50 条
  • [31] DESIGNING PULSE TRANSFORMERS FOR SMALL SIZE
    LEE, R
    IEEE TRANSACTIONS ON MAGNETICS, 1977, 13 (05) : 1220 - 1223
  • [32] Optimal Shapes of Small Transformers.
    Kersic, Nikolaj
    Kokelj, Peter
    Turk, Ivo
    Elektrotehniski Vestnik/Electrotechnical Review, 1979, 46 (04): : 232 - 237
  • [33] GAPFORMER: Fast Autoregressive Transformers meet RNNs for Personalized Adaptive Cruise Control
    Sachdeva, Noveen
    Wang, Ziran
    Han, Kyungtae
    Gupta, Rohit
    McAuley, Julian
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 2528 - 2535
  • [34] Uncertainty evaluations from small datasets
    Stoudt, Sara
    Pintar, Adam
    Possolo, Antonio
    METROLOGIA, 2021, 58 (01)
  • [35] Better Classifier Calibration for Small Datasets
    Alasalmi, Tuomo
    Suutala, Jaakko
    Roning, Juha
    Koskimaki, Heli
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2020, 14 (03)
  • [36] Simple entropy estimator for small datasets
    Montalvao, J.
    Silva, D. G.
    Attux, R.
    ELECTRONICS LETTERS, 2012, 48 (17) : 1059 - 1060
  • [37] A Bayesian independence test for small datasets
    Ku, Chin-Jen
    Fine, Terrence L.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (10) : 4026 - 4031
  • [38] Neural network modeling for small datasets
    Ingrassia, S
    Morlini, I
    TECHNOMETRICS, 2005, 47 (03) : 297 - 311
  • [39] DESIGNING SMALL CARS TO MEET REGULATIONS
    MARKS, C
    FISCHER, RG
    STEWART, EE
    AUTOMOTIVE ENGINEERING, 1974, 82 (10): : 33 - 37
  • [40] Face alignment by learning from small real datasets and large synthetic datasets
    Gao, Haoqi
    Ogawara, Koichi
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 397 - 402