Joint entity resolution on multiple datasets

被引：10

作者：

Whang, Steven Euijong ^{[1
]}

Garcia-Molina, Hector ^{[2
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

VLDB JOURNAL | 2013年 / 22卷 / 06期

基金：

美国国家科学基金会;

关键词：

Entity resolution; Joint entity resolution; Physical execution; Influence graph; Execution plan; Expander function; State-based training; Data cleaning; LINKAGE;

D O I：

10.1007/s00778-013-0308-z

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data are too large to resolve together. We study the scheduling and coordination of the individual ER algorithms, in order to resolve the full dataset, and show the scalability of our approach. We also introduce a "state-based" training technique where each ER algorithm is trained for the particular execution context (relative to other types of records) where it will be used.

引用

页码：773 / 795

页数：23

共 50 条

[31] Named Entity Recognition for Partially Annotated Datasets
Strobl, Michael
Trabelsi, Amine
Zaiane, Osmar
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 299 - 306
[32] Provenance for Entity Resolution
Oppold, Sarah
Herschel, Melanie
PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 226 - 230
[33] Skyblocking for entity resolution
Shao, Jingyu
Wang, Qing
Lin, Yu
INFORMATION SYSTEMS, 2019, 85 : 30 - 43
[34] Geospatial Entity Resolution
Balsebre, Pasquale
Yao, Dezhong
Cong, Gao
Hai, Zhen
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3061 - 3070
[35] Joint Learning of Named Entity Recognition and Entity Linking
Martins, Pedro Henrique
Marinho, Zita
Martins, Andre F. T.
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196
[36] Joint analysis of multiple trio genomic datasets for the discovery of novel dominant epilepsy genes
Ghani, H.
Byrne, S.
White, M.
Widdess-Walsh, P.
McGovern, E.
Doyle, M.
Moloney, P.
Costello, D.
Sweeney, B.
Regan, M. O'
Webb, D.
Greally, M.
Delanty, N.
Doherty, C.
Benson, K.
Cavalleri, G. L.
EPILEPSIA, 2023, 64 : 121 - 122
[37] CCA for joint blind source separation of multiple datasets with application to group fMRI analysis
Li, Yi-Ou
Wang, Wei
Adali, Tuelay
Calhoun, Vince D.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1837 - +
[38] Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets
Wang, Yinqiao
Chen, Lu
Jo, Jaemin
Wang, Yunhai
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (01) : 623 - 632
[39] Joint analysis of multiple trio genomic datasets for the discovery of novel dominant epilepsy genes
Ghani, Hamidah
Byrne, Susan
White, Maire
Widdess-Walsh, Peter
McGovern, Eavan
Doyle, Michael
Moloney, Patrick
Costello, Daniel
Sweeney, Brian
O'Regan, Mary
Webb, David
Greally, Marie
Delanty, Norman
Doherty, Colin
Benson, Katherine
Cavalleri, Gianpiero
EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 500 - 500
[40] Entity-Centric Joint Modeling of Japanese Coreference Resolution and Predicate Argument Structure Analysis
Shibata, Tomohide
Kurohashi, Sadao
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 579 - 589

← 1 2 3 4 5 →