Entity Resolution Based on Pre-trained Language Models with Two Attentions

被引:0
|
作者
Zhu, Liang [1 ]
Liu, Hao [1 ]
Song, Xin [1 ]
Wei, Yonggang [1 ]
Wang, Yu [1 ]
机构
[1] Hebei Univ, Baoding 071002, Hebei, Peoples R China
来源
关键词
Entity Resolution; Pre-trained Language Model; Interactive Attention; Global Attention;
D O I
10.1007/978-981-97-2387-4_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity Resolution (ER) is one of the most important issues for improving data quality, which aims to identify the records from one and more datasets that refer to the same real-world entity. For the textual datasets with the attribute values of long word sequences, the traditional methods of ER may fail to capture accurately the semantic information of records, leading to poor effectiveness. To address this challenging problem, in this paper, by using pre-trained language model RoBERTa and by fine-tuning it in the training process, we propose a novel entity resolution model IGaBERT, in which interactive attention is applied to capture token-level differences between records and to break the restriction that the schema required identically, and then global attention is utilized to determine the importance of these differences. Extensive experiments without injecting domain knowledge are conducted to measure the effectiveness of the IGaBERT model over both structured datasets and textual datasets. The results indicate that IGaBERT significantly outperforms several state-of-the-art approaches over textual datasets, especially with small size of training data, and it is highly competitive with those approaches over structured datasets.
引用
收藏
页码:433 / 448
页数:16
相关论文
共 50 条
  • [1] Deep Entity Matching with Pre-Trained Language Models
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshihiko
    Doan, AnHai
    Tan, Wang-Chiew
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
  • [2] Probing the Robustness of Pre-trained Language Models for Entity Matching
    Rastaghi, Mehdi Akbarian
    Kamalloo, Ehsan
    Rafiei, Davood
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3786 - 3790
  • [3] Recent Progress on Named Entity Recognition Based on Pre-trained Language Models
    Yang, Binxia
    Luo, Xudong
    [J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 799 - 804
  • [4] Entity Linking of Sound Recordings and Compositions with Pre-trained Language Models
    Katakis, Nikiforos
    Vikatos, Pantelis
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST), 2021, : 474 - 481
  • [5] Interpretability of Entity Matching Based on Pre-trained Language Model
    Liang Z.
    Wang H.-Z.
    Dai J.-J.
    Shao X.-Y.
    Ding X.-O.
    Mu T.-Y.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1087 - 1108
  • [6] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    [J]. ENGINEERING, 2023, 25 : 51 - 65
  • [7] Schema-Agnostic Entity Matching using Pre-trained Language Models
    Teong, Kai-Sheng
    Soon, Lay-Ki
    Su, Tin Tin
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2241 - 2244
  • [8] Somun: entity-centric summarization incorporating pre-trained language models
    Inan, Emrah
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5301 - 5311
  • [9] A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models
    Ye, Deming
    Lin, Yankai
    Li, Peng
    Sun, Maosong
    Liu, Zhiyuan
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 523 - 529
  • [10] Somun: entity-centric summarization incorporating pre-trained language models
    Emrah Inan
    [J]. Neural Computing and Applications, 2021, 33 : 5301 - 5311