Adaptable Adapters

被引:0
|
作者
Moosavi, Nafise Sadat [1 ]
Delfosse, Quentin [3 ]
Kersting, Kristian [2 ,3 ]
Gurevych, Iryna [2 ,4 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield, S Yorkshire, England
[2] Tech Univ Darmstadt, Hessian Ctr AI Hessian AI, Darmstadt, Germany
[3] Tech Univ Darmstadt, AI & Machine Learning Lab, Darmstadt, Germany
[4] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP Lab, Dept Comp Sci, Darmstadt, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture-i.e., the same adapter layer on top of each layer of the pretrained model-for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably smaller number of adapter layers. In addition, we show that the selected adapter architecture by adaptable adapters transfers well across different data settings and similar tasks. We propose to use adaptable adapters for designing efficient and effective adapter architectures. The resulting adapters (a) contain about 50% of the learning parameters of the standard adapter and are therefore more efficient at training and inference, and require less storage space, and (b) achieve considerably higher performances in low-data settings.(1)
引用
收藏
页码:3742 / 3753
页数:12
相关论文
共 50 条
  • [31] POLARIZING ADAPTERS FOR THE WOLFE GONIOMETER
    WOLFE, CW
    AMERICAN MINERALOGIST, 1959, 44 (1-2) : 182 - 184
  • [32] MAC ADAPTERS EMBRACE ETHERNET
    DIEHL, S
    BYTE, 1990, 15 (01): : 203 - 205
  • [33] CORRECTION FOR ADAPTERS IN MICROWAVE MEASUREMENTS
    UHLIR, A
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 1974, TT22 (03) : 330 - 332
  • [34] Pruning Adapters with Lottery Ticket
    Wu, Jiarun
    Chen, Qingliang
    ALGORITHMS, 2022, 15 (02)
  • [35] Common knowledge: Metaprogrammed adapters
    Dewhurst, Steve
    C/C++ Users Journal, 2002, 20 (04):
  • [36] Personalised Aesthetics with Residual Adapters
    Rodriguez-Pardo, Carlos
    Bilen, Hakan
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT I, 2020, 11867 : 508 - 520
  • [37] ENZYMES AS ORBITAL SYMMETRY ADAPTERS
    FERREIRA, R
    JOURNAL OF THEORETICAL BIOLOGY, 1973, 39 (03) : 665 - 668
  • [38] AdapterDrop: On the Efficiency of Adapters in Transformers
    Rueckle, Andreas
    Geigle, Gregor
    Glockner, Max
    Beck, Tilman
    Pfeiffer, Jonas
    Reimers, Nils
    Gurevych, Iryna
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7930 - 7946
  • [39] Stethoscope adapters - getting an earfull
    Seppanen, K
    AUSTRALIAN VETERINARY PRACTITIONER, 2004, 34 (01): : 41 - 41
  • [40] FIRMWARE APPROACH TO DATA ADAPTERS
    WOLF, P
    ELECTRONIC ENGINEER, 1971, 30 (12): : DC11 - &