Entity Resolution Based on Pre-trained Language Models with Two Attentions

被引：0

作者：

Zhu, Liang ^{[1
]}

Liu, Hao ^{[1
]}

Song, Xin ^{[1
]}

Wei, Yonggang ^{[1
]}

Wang, Yu ^{[1
]}

机构：

[1] Hebei Univ, Baoding 071002, Hebei, Peoples R China

来源：

WEB AND BIG DATA, PT III, APWEB-WAIM 2023 | 2024年 / 14333卷

关键词：

Entity Resolution; Pre-trained Language Model; Interactive Attention; Global Attention;

D O I：

10.1007/978-981-97-2387-4_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Entity Resolution (ER) is one of the most important issues for improving data quality, which aims to identify the records from one and more datasets that refer to the same real-world entity. For the textual datasets with the attribute values of long word sequences, the traditional methods of ER may fail to capture accurately the semantic information of records, leading to poor effectiveness. To address this challenging problem, in this paper, by using pre-trained language model RoBERTa and by fine-tuning it in the training process, we propose a novel entity resolution model IGaBERT, in which interactive attention is applied to capture token-level differences between records and to break the restriction that the schema required identically, and then global attention is utilized to determine the importance of these differences. Extensive experiments without injecting domain knowledge are conducted to measure the effectiveness of the IGaBERT model over both structured datasets and textual datasets. The results indicate that IGaBERT significantly outperforms several state-of-the-art approaches over textual datasets, especially with small size of training data, and it is highly competitive with those approaches over structured datasets.

引用

页码：433 / 448

页数：16

共 50 条

[41] Dynamic Knowledge Distillation for Pre-trained Language Models
Li, Lei
Lin, Yankai
Ren, Shuhuai
Li, Peng
Zhou, Jie
Sun, Xu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
[42] Impact of Morphological Segmentation on Pre-trained Language Models
Westhelle, Matheus
Bencke, Luciana
Moreira, Viviane P.
INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 402 - 416
[43] A Close Look into the Calibration of Pre-trained Language Models
Chen, Yangyi
Yuan, Lifan
Cui, Ganqu
Liu, Zhiyuan
Ji, Heng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1343 - 1367
[44] Context Analysis for Pre-trained Masked Language Models
Lai, Yi-An
Lalwani, Garima
Zhang, Yi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3789 - 3804
[45] Exploring Lottery Prompts for Pre-trained Language Models
Chen, Yulin
Ding, Ning
Wang, Xiaobin
Hu, Shengding
Zheng, Hai-Tao
Liu, Zhiyuan
Xie, Pengjun
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15428 - 15444
[46] A Survey of Knowledge Enhanced Pre-Trained Language Models
Hu, Linmei
Liu, Zeyi
Zhao, Ziwang
Hou, Lei
Nie, Liqiang
Li, Juanzi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430
[47] Self-conditioning Pre-Trained Language Models
Suau, Xavier
Zappella, Luca
Apostoloff, Nicholas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[48] Pre-trained models for natural language processing: A survey
QIU XiPeng
SUN TianXiang
XU YiGe
SHAO YunFan
DAI Ning
HUANG XuanJing
Science China Technological Sciences, 2020, 63 (10) : 1872 - 1897
[49] Pre-trained language models: What do they know?
Guimaraes, Nuno
Campos, Ricardo
Jorge, Alipio
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (01)
[50] Evaluating the Summarization Comprehension of Pre-Trained Language Models
Chernyshev, D. I.
Dobrov, B. V.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3028 - 3039

← 1 2 3 4 5 →