Joint entity resolution on multiple datasets

被引:10
|
作者
Whang, Steven Euijong [1 ]
Garcia-Molina, Hector [2 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
VLDB JOURNAL | 2013年 / 22卷 / 06期
基金
美国国家科学基金会;
关键词
Entity resolution; Joint entity resolution; Physical execution; Influence graph; Execution plan; Expander function; State-based training; Data cleaning; LINKAGE;
D O I
10.1007/s00778-013-0308-z
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data are too large to resolve together. We study the scheduling and coordination of the individual ER algorithms, in order to resolve the full dataset, and show the scalability of our approach. We also introduce a "state-based" training technique where each ER algorithm is trained for the particular execution context (relative to other types of records) where it will be used.
引用
收藏
页码:773 / 795
页数:23
相关论文
共 50 条
  • [31] Named Entity Recognition for Partially Annotated Datasets
    Strobl, Michael
    Trabelsi, Amine
    Zaiane, Osmar
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 299 - 306
  • [32] Provenance for Entity Resolution
    Oppold, Sarah
    Herschel, Melanie
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 226 - 230
  • [33] Skyblocking for entity resolution
    Shao, Jingyu
    Wang, Qing
    Lin, Yu
    INFORMATION SYSTEMS, 2019, 85 : 30 - 43
  • [34] Geospatial Entity Resolution
    Balsebre, Pasquale
    Yao, Dezhong
    Cong, Gao
    Hai, Zhen
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3061 - 3070
  • [35] Joint Learning of Named Entity Recognition and Entity Linking
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196
  • [36] Joint analysis of multiple trio genomic datasets for the discovery of novel dominant epilepsy genes
    Ghani, H.
    Byrne, S.
    White, M.
    Widdess-Walsh, P.
    McGovern, E.
    Doyle, M.
    Moloney, P.
    Costello, D.
    Sweeney, B.
    Regan, M. O'
    Webb, D.
    Greally, M.
    Delanty, N.
    Doherty, C.
    Benson, K.
    Cavalleri, G. L.
    EPILEPSIA, 2023, 64 : 121 - 122
  • [37] CCA for joint blind source separation of multiple datasets with application to group fMRI analysis
    Li, Yi-Ou
    Wang, Wei
    Adali, Tuelay
    Calhoun, Vince D.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1837 - +
  • [38] Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets
    Wang, Yinqiao
    Chen, Lu
    Jo, Jaemin
    Wang, Yunhai
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (01) : 623 - 632
  • [39] Joint analysis of multiple trio genomic datasets for the discovery of novel dominant epilepsy genes
    Ghani, Hamidah
    Byrne, Susan
    White, Maire
    Widdess-Walsh, Peter
    McGovern, Eavan
    Doyle, Michael
    Moloney, Patrick
    Costello, Daniel
    Sweeney, Brian
    O'Regan, Mary
    Webb, David
    Greally, Marie
    Delanty, Norman
    Doherty, Colin
    Benson, Katherine
    Cavalleri, Gianpiero
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 500 - 500
  • [40] Entity-Centric Joint Modeling of Japanese Coreference Resolution and Predicate Argument Structure Analysis
    Shibata, Tomohide
    Kurohashi, Sadao
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 579 - 589