Knowledge Transfer for Entity Resolution with Siamese Neural Networks

被引：8

作者：

Loster, Michael ^{[1
]}

Koumarelas, Ioannis ^{[1
]}

Naumann, Felix ^{[1
]}

机构：

[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany

来源：

ACM JOURNAL OF DATA AND INFORMATION QUALITY | 2021年 / 13卷 / 01期

关键词：

Entity resolution; duplicate detection; transfer learning; neural networks; metric learning; similarity learning; data quality;

D O I：

10.1145/3410157

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity-duplicates-into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.

引用

页数：25

共 50 条

[1] Joint Multi-field Siamese Recurrent Neural Network for Entity Resolution
Lv, Yang
Qi, Lei
Huo, Jing
Wang, Hao
Gao, Yang
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 482 - 490
[2] KD-FIXMATCH: KNOWLEDGE DISTILLATION SIAMESE NEURAL NETWORKS
Wang, Chien-Chih
Xu, Shaoyuan
Fu, Jinmiao
Liu, Yang
Wang, Bryan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 341 - 345
[3] Knowledge transfer in SVM and neural networks
Vladimir Vapnik
Rauf Izmailov
Annals of Mathematics and Artificial Intelligence, 2017, 81 : 3 - 19
[4] Knowledge transfer in SVM and neural networks
Vapnik, Vladimir
Izmailov, Rauf
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2017, 81 (1-2) : 3 - 19
[5] Siamese neural networks in recommendation
Serrano, Nicolas
Bellogin, Alejandro
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (19): : 13941 - 13953
[6] Siamese neural networks in recommendation
Nicolás Serrano
Alejandro Bellogín
Neural Computing and Applications, 2023, 35 : 13941 - 13953
[7] Business Entity Matching with Siamese Graph Convolutional Networks
Krivosheev, Evgeny
Atzeni, Mattia
Mirylenka, Katsiaryna
Scotton, Paolo
Miksovic, Christoph
Zorin, Anton
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 16054 - 16056
[8] Transfer Learning for Named-Entity Recognition with Neural Networks
Lee, Ji Young
Dernoncourt, Franck
Szolovits, Peter
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4470 - 4473
[9] Transfer learning for biomedical named entity recognition with neural networks
Giorgi, John M.
Bader, Gary D.
BIOINFORMATICS, 2018, 34 (23) : 4087 - 4094
[10] Training binary neural networks with knowledge transfer
Leroux, Sam
Vankeirsbilck, Bert
Verbelen, Tim
Simoens, Pieter
Dhoedt, Bart
NEUROCOMPUTING, 2020, 396 : 534 - 541

← 1 2 3 4 5 →