Joint entity resolution on multiple datasets

被引:10
|
作者
Whang, Steven Euijong [1 ]
Garcia-Molina, Hector [2 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
VLDB JOURNAL | 2013年 / 22卷 / 06期
基金
美国国家科学基金会;
关键词
Entity resolution; Joint entity resolution; Physical execution; Influence graph; Execution plan; Expander function; State-based training; Data cleaning; LINKAGE;
D O I
10.1007/s00778-013-0308-z
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data are too large to resolve together. We study the scheduling and coordination of the individual ER algorithms, in order to resolve the full dataset, and show the scalability of our approach. We also introduce a "state-based" training technique where each ER algorithm is trained for the particular execution context (relative to other types of records) where it will be used.
引用
收藏
页码:773 / 795
页数:23
相关论文
共 50 条
  • [1] Joint entity resolution on multiple datasets
    Steven Euijong Whang
    Hector Garcia-Molina
    The VLDB Journal, 2013, 22 : 773 - 795
  • [2] Network metrics for assessing the quality of entity resolution between multiple datasets
    Al Idrissou
    van Harmelen, Frank
    van den Besselaar, Peter
    SEMANTIC WEB, 2021, 12 (01) : 21 - 40
  • [3] Joint Entity Resolution
    Whang, Steven Euijong
    Garcia-Molina, Hector
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 294 - 305
  • [4] Synthesizing Privacy Preserving Entity Resolution Datasets
    Qinl, Xuedi
    Chai, Chengliang
    Tang, Nan
    Li, Jian
    Luo, Yuyu
    Li, Guoliang
    Zhu, Yaoyu
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2359 - 2371
  • [5] Using Transliteration with Entity Resolution for Arabic Datasets
    Alian, Marwah
    Al-Naymat, Ghazi
    Ramadan, Banda
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 593 - 597
  • [6] Extensible and Scalable Entity Resolution for Financial Datasets Using RLTK
    Yao, Yixiang
    Szekely, Pedro
    Pujara, Jay
    PROCEEDINGS OF THE FIFTH INTERNATIONAL WORKSHOP ON DATA SCIENCE FOR MACRO-MODELING (DSMM 2019), 2019,
  • [7] A related data oriented joint entity resolution approach
    Sun, Chen-Chen
    Shen, De-Rong
    Kou, Yue
    Nie, Tie-Zheng
    Yu, Ge
    Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (09): : 1740 - 1754
  • [8] Joint Learning of Named Entity Recognition and Dependency Parsing using Separate Datasets
    Akdemir, Arda
    Gungor, Tunga
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 841 - 850
  • [9] Underdetermined Joint Blind Source Separation of Multiple Datasets
    Zou, Liang
    Chen, Xun
    Ji, Xiangyang
    Wang, Z. Jane
    IEEE ACCESS, 2017, 5 : 7474 - 7487
  • [10] Joint Preprocessing of Multiple Datasets to Enhance Source Separation
    Naghsh, Erfan
    Sabahi, Mohamad F.
    Beheshti, Soosan
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (12) : 1917 - 1921