Entity Resolution Based on Pre-trained Language Models with Two Attentions

被引：0

作者：

Zhu, Liang ^{[1
]}

Liu, Hao ^{[1
]}

Song, Xin ^{[1
]}

Wei, Yonggang ^{[1
]}

Wang, Yu ^{[1
]}

机构：

[1] Hebei Univ, Baoding 071002, Hebei, Peoples R China

来源：

WEB AND BIG DATA, PT III, APWEB-WAIM 2023 | 2024年 / 14333卷

关键词：

Entity Resolution; Pre-trained Language Model; Interactive Attention; Global Attention;

D O I：

10.1007/978-981-97-2387-4_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Entity Resolution (ER) is one of the most important issues for improving data quality, which aims to identify the records from one and more datasets that refer to the same real-world entity. For the textual datasets with the attribute values of long word sequences, the traditional methods of ER may fail to capture accurately the semantic information of records, leading to poor effectiveness. To address this challenging problem, in this paper, by using pre-trained language model RoBERTa and by fine-tuning it in the training process, we propose a novel entity resolution model IGaBERT, in which interactive attention is applied to capture token-level differences between records and to break the restriction that the schema required identically, and then global attention is utilized to determine the importance of these differences. Extensive experiments without injecting domain knowledge are conducted to measure the effectiveness of the IGaBERT model over both structured datasets and textual datasets. The results indicate that IGaBERT significantly outperforms several state-of-the-art approaches over textual datasets, especially with small size of training data, and it is highly competitive with those approaches over structured datasets.

引用

页码：433 / 448

页数：16

共 50 条

[1] Deep Entity Matching with Pre-Trained Language Models
Li, Yuliang
Li, Jinfeng
Suhara, Yoshihiko
Doan, AnHai
Tan, Wang-Chiew
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
[2] Probing the Robustness of Pre-trained Language Models for Entity Matching
Rastaghi, Mehdi Akbarian
Kamalloo, Ehsan
Rafiei, Davood
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3786 - 3790
[3] Recent Progress on Named Entity Recognition Based on Pre-trained Language Models
Yang, Binxia
Luo, Xudong
[J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 799 - 804
[4] Entity Linking of Sound Recordings and Compositions with Pre-trained Language Models
Katakis, Nikiforos
Vikatos, Pantelis
[J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST), 2021, : 474 - 481
[5] Interpretability of Entity Matching Based on Pre-trained Language Model
Liang Z.
Wang H.-Z.
Dai J.-J.
Shao X.-Y.
Ding X.-O.
Mu T.-Y.
[J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1087 - 1108
[6] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
[J]. ENGINEERING, 2023, 25 : 51 - 65
[7] Schema-Agnostic Entity Matching using Pre-trained Language Models
Teong, Kai-Sheng
Soon, Lay-Ki
Su, Tin Tin
[J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2241 - 2244
[8] Somun: entity-centric summarization incorporating pre-trained language models
Inan, Emrah
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5301 - 5311
[9] A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models
Ye, Deming
Lin, Yankai
Li, Peng
Sun, Maosong
Liu, Zhiyuan
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 523 - 529
[10] Somun: entity-centric summarization incorporating pre-trained language models
Emrah Inan
[J]. Neural Computing and Applications, 2021, 33 : 5301 - 5311

← 1 2 3 4 5 →