Adder Encoder for Pre-trained Language Model

被引：0

作者：

Ding, Jianbang ^{[1
]}

Zhang, Suiyun ^{[1
]}

Li, Linlin ^{[2
]}

机构：

[1] Huawei Technol Co Ltd, Shenzhen, Peoples R China

[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada

来源：

CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023 | 2023年 / 14232卷

关键词：

PLMs; Distillation; AdderBERT;

D O I：

10.1007/978-981-99-6207-5_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

BERT, a pre-trained language model entirely based on attention, has proven to be highly performant for natural language understanding tasks. However, pre-trained language models (PLMs) are often computationally expensive and can hardly be implemented with limited resources. To reduce energy burden, we introduce adder operations into the Transformer encoder and propose a novel AdderBERT with powerful representation capability. Then, we adopt mapping-based distillation to further improve its energy efficiency with an assured performance. Empirical results demonstrate that AddderBERT(6) achieves highly competitive performance against that of its teacher BERTBASE on the GLUE benchmark while obtaining a 4.9x reduction in energy consumption.

引用

页码：339 / 347

页数：9

共 50 条

[1] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[2] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[3] Topic-Attentive Encoder-Decoder with Pre-Trained Language Model for Keyphrase Generation
Zhou, Cangqi
Shang, Jinling
Zhang, Jing
Li, Qianmu
Hu, Dimming
[J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1529 - 1534
[4] Surgicberta: a pre-trained language model for procedural surgical language
Bombieri, Marco
Rospocher, Marco
Ponzetto, Simone Paolo
Fiorini, Paolo
[J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
[5] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
Ji, Yanrong
Zhou, Zhihan
Liu, Han
Davuluri, Ramana, V
[J]. BIOINFORMATICS, 2021, 37 (15) : 2112 - 2120
[6] ViDeBERTa: A powerful pre-trained language model for Vietnamese
Tran, Cong Dao
Pham, Nhut Huy
Nguyen, Anh
Hy, Truong Son
Vu, Tu
[J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078
[7] BERTweet: A pre-trained language model for English Tweets
Dat Quoc Nguyen
Thanh Vu
Anh Tuan Nguyen
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 9 - 14
[8] Pre-trained Language Model for Biomedical Question Answering
Yoon, Wonjin
Lee, Jinhyuk
Kim, Donghyeon
Jeong, Minbyul
Kang, Jaewoo
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740
[9] Misspelling Correction with Pre-trained Contextual Language Model
Hu, Yifei
Ting, Xiaonan
Ko, Youlim
Rayz, Julia Taylor
[J]. PROCEEDINGS OF 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2020), 2020, : 144 - 149
[10] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
Park, Jeonghyeok
Zhao, Hai
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694

← 1 2 3 4 5 →