On the Sentence Embeddings from Pre-trained Language Models

被引：0

作者：

Li, Bohan ^{[1
,2
]}

Zhou, Hao ^{[1
]}

He, Junxian ^{[2
]}

Wang, Mingxuan ^{[1
]}

Yang, Yiming ^{[2
]}

Li, Lei ^{[1
]}

机构：

[1] ByteDance AI Lab, Warsaw, Poland

[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT- flow.

引用

页码：9119 / 9130

页数：12

共 50 条

[41] Pre-trained language models for keyphrase prediction: A review
Umair, Muhammad
Sultana, Tangina
Lee, Young-Koo
[J]. ICT EXPRESS, 2024, 10 (04): : 871 - 890
[42] Deep Entity Matching with Pre-Trained Language Models
Li, Yuliang
Li, Jinfeng
Suhara, Yoshihiko
Doan, AnHai
Tan, Wang-Chiew
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
[43] A Close Look into the Calibration of Pre-trained Language Models
Chen, Yangyi
Yuan, Lifan
Cui, Ganqu
Liu, Zhiyuan
Ji, Heng
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1343 - 1367
[44] Exploring Lottery Prompts for Pre-trained Language Models
Chen, Yulin
Ding, Ning
Wang, Xiaobin
Hu, Shengding
Zheng, Hai-Tao
Liu, Zhiyuan
Xie, Pengjun
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15428 - 15444
[45] Context Analysis for Pre-trained Masked Language Models
Lai, Yi-An
Lalwani, Garima
Zhang, Yi
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3789 - 3804
[46] Pre-trained language models: What do they know?
Guimaraes, Nuno
Campos, Ricardo
Jorge, Alipio
[J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (01)
[47] Pre-trained models for natural language processing: A survey
QIU XiPeng
SUN TianXiang
XU YiGe
SHAO YunFan
DAI Ning
HUANG XuanJing
[J]. Science China Technological Sciences, 2020, 63 (10) : 1872 - 1897
[48] Empowering News Recommendation with Pre-trained Language Models
Wu, Chuhan
Wu, Fangzhao
Qi, Tao
Huang, Yongfeng
[J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1652 - 1656
[49] Understanding Online Attitudes with Pre-Trained Language Models
Power, William
Obradovic, Zoran
[J]. PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 745 - 752
[50] Capturing Semantics for Imputation with Pre-trained Language Models
Mei, Yinan
Song, Shaoxu
Fang, Chenguang
Yang, Haifeng
Fang, Jingyun
Long, Jiang
[J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 61 - 72

← 1 2 3 4 5 →