Adder Encoder for Pre-trained Language Model

被引:0
|
作者
Ding, Jianbang [1 ]
Zhang, Suiyun [1 ]
Li, Linlin [2 ]
机构
[1] Huawei Technol Co Ltd, Shenzhen, Peoples R China
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
来源
关键词
PLMs; Distillation; AdderBERT;
D O I
10.1007/978-981-99-6207-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
BERT, a pre-trained language model entirely based on attention, has proven to be highly performant for natural language understanding tasks. However, pre-trained language models (PLMs) are often computationally expensive and can hardly be implemented with limited resources. To reduce energy burden, we introduce adder operations into the Transformer encoder and propose a novel AdderBERT with powerful representation capability. Then, we adopt mapping-based distillation to further improve its energy efficiency with an assured performance. Empirical results demonstrate that AddderBERT(6) achieves highly competitive performance against that of its teacher BERTBASE on the GLUE benchmark while obtaining a 4.9x reduction in energy consumption.
引用
收藏
页码:339 / 347
页数:9
相关论文
共 50 条
  • [1] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [2] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [3] Topic-Attentive Encoder-Decoder with Pre-Trained Language Model for Keyphrase Generation
    Zhou, Cangqi
    Shang, Jinling
    Zhang, Jing
    Li, Qianmu
    Hu, Dimming
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1529 - 1534
  • [4] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [5] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
    Ji, Yanrong
    Zhou, Zhihan
    Liu, Han
    Davuluri, Ramana, V
    [J]. BIOINFORMATICS, 2021, 37 (15) : 2112 - 2120
  • [6] ViDeBERTa: A powerful pre-trained language model for Vietnamese
    Tran, Cong Dao
    Pham, Nhut Huy
    Nguyen, Anh
    Hy, Truong Son
    Vu, Tu
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078
  • [7] BERTweet: A pre-trained language model for English Tweets
    Dat Quoc Nguyen
    Thanh Vu
    Anh Tuan Nguyen
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 9 - 14
  • [8] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740
  • [9] Misspelling Correction with Pre-trained Contextual Language Model
    Hu, Yifei
    Ting, Xiaonan
    Ko, Youlim
    Rayz, Julia Taylor
    [J]. PROCEEDINGS OF 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2020), 2020, : 144 - 149
  • [10] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
    Park, Jeonghyeok
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694