Adder Encoder for Pre-trained Language Model

被引:0
|
作者
Ding, Jianbang [1 ]
Zhang, Suiyun [1 ]
Li, Linlin [2 ]
机构
[1] Huawei Technol Co Ltd, Shenzhen, Peoples R China
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
来源
关键词
PLMs; Distillation; AdderBERT;
D O I
10.1007/978-981-99-6207-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
BERT, a pre-trained language model entirely based on attention, has proven to be highly performant for natural language understanding tasks. However, pre-trained language models (PLMs) are often computationally expensive and can hardly be implemented with limited resources. To reduce energy burden, we introduce adder operations into the Transformer encoder and propose a novel AdderBERT with powerful representation capability. Then, we adopt mapping-based distillation to further improve its energy efficiency with an assured performance. Empirical results demonstrate that AddderBERT(6) achieves highly competitive performance against that of its teacher BERTBASE on the GLUE benchmark while obtaining a 4.9x reduction in energy consumption.
引用
收藏
页码:339 / 347
页数:9
相关论文
共 50 条
  • [31] AraXLNet: pre-trained language model for sentiment analysis of Arabic
    Alhanouf Alduailej
    Abdulrahman Alothaim
    [J]. Journal of Big Data, 9
  • [32] ARoBERT: An ASR Robust Pre-Trained Language Model for Spoken Language Understanding
    Wang, Chengyu
    Dai, Suyang
    Wang, Yipeng
    Yang, Fei
    Qiu, Minghui
    Chen, Kehan
    Zhou, Wei
    Huang, Jun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1207 - 1218
  • [33] Siamese Pre-Trained Transformer Encoder for Knowledge Base Completion
    Li, Mengyao
    Wang, Bo
    Jiang, Jing
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (06) : 4143 - 4158
  • [34] Siamese Pre-Trained Transformer Encoder for Knowledge Base Completion
    Mengyao Li
    Bo Wang
    Jing Jiang
    [J]. Neural Processing Letters, 2021, 53 : 4143 - 4158
  • [35] Learning to Summarize Chinese Radiology Findings With a Pre-Trained Encoder
    Jiang, Zuowei
    Cai, Xiaoyan
    Yang, Libin
    Gao, Dehong
    Zhao, Wei
    Han, Junwei
    Liu, Jun
    Shen, Dinggang
    Liu, Tianming
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2023, 70 (12) : 3277 - 3287
  • [36] PtbStolen: Pre-trained Encoder Stealing Through Perturbed Samples
    Zhang, Chuan
    Liang, Haotian
    Li, Zhuopeng
    Wu, Tong
    Wang, Licheng
    Zhu, Liehuang
    [J]. EMERGING INFORMATION SECURITY AND APPLICATIONS, EISA 2023, 2024, 2004 : 1 - 19
  • [37] A teacher action recognition model based on pre-trained language and video model
    Luo, Sen
    Zhou, Juxiang
    Wen, Xiaoyu
    Li, Hao
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY AND COMPUTERS, ICETC 2023, 2023, : 335 - 340
  • [38] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [39] Lawformer: A pre-trained language model for Chinese legal long documents
    Xiao, Chaojun
    Hu, Xueyu
    Liu, Zhiyuan
    Tu, Cunchao
    Sun, Maosong
    [J]. AI OPEN, 2021, 2 : 79 - 84
  • [40] Schema matching based on energy domain pre-trained language model
    Pan Z.
    Yang M.
    Monti A.
    [J]. Energy Informatics, 2023, 6 (Suppl 1)