Interpretability of Entity Matching Based on Pre-trained Language Model

被引:0
|
作者
Liang Z. [1 ]
Wang H.-Z. [1 ]
Dai J.-J. [1 ]
Shao X.-Y. [1 ]
Ding X.-O. [1 ]
Mu T.-Y. [1 ]
机构
[1] Faculty of Computing, Harbin Institute of Technology, Harbin
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 03期
关键词
entity matching; interpretability; pre-trained language model;
D O I
10.13328/j.cnki.jos.006794
中图分类号
学科分类号
摘要
Entity matching can determine whether records in two datasets point to the same real-world entity, and is indispensable for tasks such as big data integration, social network analysis, and web semantic data management. As a deep learning technology that has achieved a lot of success in natural language processing and computer vision, pre-trained language models have also achieved better results than traditional methods in entity matching tasks, which have attracted the attention of a large number of researchers. However, the performance of entity matching based on pre-trained language model is unstable and the matching results cannot be explained, which brings great uncertainty to the application of this technology in big data integration. At the same time, the existing entity matching model interpretation methods are mainly oriented to machine learning methods as model-agnostic interpretation, and there are shortcomings in their applicability on pre-trained language models. Therefore, this study takes BERT entity matching models such as Ditto and JointBERT as examples, and proposes three model interpretation methods for pre-training language model entity matching technology to solve this problem. (1) In the serialization operation, the order of relational data attributes is sensitive. Dataset meta-features and attribute similarity are used to generate attribute ranking counterfactuals for misclassified samples; (2) As a supplement to traditional attribute importance measurement, the pre-trained language model attention weights are used to measure and visualize model processing; (3) Based on the serialized sentence vector, the k-nearest neighbor search technique is used to recall the samples with good interpretability similar to the misclassified samples to enhance the low-confidence prediction results of pre-trained language model. Experiments on real public datasets show that while improving the model effect through the enhancing method, the proposed method can reach 68.8% of the upper limit of fidelity in the attribute order search space, which provides a decision explanation for the pre-trained language entity matching model. New perspectives such as attribute order counterfactual and attribute association understanding are also introduced. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1087 / 1108
页数:21
相关论文
共 34 条
  • [1] Doan AH, Halevy AY, Ives ZG., Principles of Data Integration, (2012)
  • [2] Dong XL, Rekatsinas T., Data integration and machine learning: A natural synergy, Proc. of the Int’l Conf. on Management of Data (SIGMOD 2018), pp. 1645-1650, (2018)
  • [3] Wang J, Li G, Yu JX, Feng J., Entity matching: How similar is similar, Proc. of the VLDB Endowment, 4, 10, pp. 622-633, (2011)
  • [4] Chai C, Li G, Li J, Deng D, Feng J., A partial-order-based framework for cost-effective crowdsourced entity resolution, VLDB Journal, 27, 6, pp. 745-770, (2018)
  • [5] Das S, Et al., Falcon: Scaling Up Hands-off Crowdsourced Entity Matching to Build Cloud Services, Proc. of the ACM Int’l Conf. on Management of Data (SIGMOD 2017), pp. 1431-1446, (2017)
  • [6] Ebraheem M, Thirumuruganathan S, Joty SR, Ouzzani M, Tang N., Distributed representations of tuples for entity resolution, Proc. of the VLDB Endowment, 11, 11, pp. 1454-1467, (2018)
  • [7] Li Y, Li J, Suhara Y, Et al., Deep entity matching with pre-trained language models, Proc. of the VLDB Endowment, 14, 1, pp. 50-60, (2020)
  • [8] Tu JH, Fan J, Tang N, Wang P, Chai CL, Li GL, Fan RX, Du XY., Domain adaptation for deep entity resolution, Proc. of the Int’l Conf. on Management of Data (SIGMOD 2022), pp. 443-457, (2022)
  • [9] Peeters R, Bizer C., Dual-objective fine-tuning of BERT for entity matching, Proc. of the VLDB Endowment, 14, 10, pp. 1913-1921, (2021)
  • [10] Ebaid A, Thirumuruganathan S, Aref WG, Et al., EXPLAINER: Entity resolution explanations, Proc. of the ICDE, pp. 2000-2003, (2019)