Pay Attention to MLPs

被引:0
|
作者
Liu, Hanxiao [1 ]
Dai, Zihang [1 ]
So, David R. [1 ]
Le, Quoc V. [1 ]
机构
[1] Google Res, Brain Team, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers [1] have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple network architecture, gMLP, based on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy. For BERT, our model achieves parity with Transformers on pretraining perplexity and is better on some downstream NLP tasks. On finetuning tasks where gMLP performs worse, making the gMLP model substantially larger can close the gap with Transformers. In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Can attention enable MLPs to catch up with CNNs?
    Guo, Meng-Hao
    Liu, Zheng-Ning
    Mu, Tai-Jiang
    Liang, Dun
    Martin, Ralph R.
    Hu, Shi-Min
    [J]. COMPUTATIONAL VISUAL MEDIA, 2021, 7 (03) : 283 - 288
  • [2] Can attention enable MLPs to catch up with CNNs?
    Meng-Hao Guo
    Zheng-Ning Liu
    Tai-Jiang Mu
    Dun Liang
    Ralph R.Martin
    Shi-Min Hu
    [J]. Computational Visual Media, 2021, 7 (03) : 283 - 288
  • [3] Can attention enable MLPs to catch up with CNNs?
    Meng-Hao Guo
    Zheng-Ning Liu
    Tai-Jiang Mu
    Dun Liang
    Ralph R. Martin
    Shi-Min Hu
    [J]. Computational Visual Media, 2021, 7 : 283 - 288
  • [4] Does It Pay to Pay Attention?
    Gargano, Antonio
    Rossi, Alberto G.
    [J]. REVIEW OF FINANCIAL STUDIES, 2018, 31 (12): : 4595 - 4649
  • [5] Pay Attention or Pay the Price
    Suchy, Adam
    [J]. AGRESIVITA NA CESTACH, 2009, : 37 - +
  • [6] PAY ATTENTION
    HERMAN, G
    [J]. MICROCOMPUTING, 1983, 7 (08): : 28 - 28
  • [7] Pay Attention
    Jaekl, Phil
    [J]. SCIENTIST, 2018, 32 (12): : 15 - 17
  • [8] PAY ATTENTION
    TRABASSO, T
    [J]. PSYCHOLOGY TODAY, 1968, 2 (05) : 30 - 36
  • [9] Pay attention
    Flanagan, PR
    [J]. LAB ANIMAL, 1998, 27 (02) : 20 - 20
  • [10] Pay attention
    Wieder, S
    [J]. NEW SCIENTIST, 2002, 174 (2340) : 60 - 61