Query-Driven Sampling for Collective Entity Resolution

被引:2
|
作者
Grant, Christan [1 ]
Wang, Daisy Zhe [2 ]
Wick, Michael [3 ]
机构
[1] Univ Oklahoma, Norman, OK 73019 USA
[2] Univ Florida, Gainesville, FL 32611 USA
[3] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/IRI.2016.34
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Entity Resolution is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators - selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions.
引用
收藏
页码:208 / 217
页数:10
相关论文
共 50 条
  • [1] Progressive Query-Driven Entity Resolution
    Zecchini, Luca
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2021, 2021, 13058 : 395 - 401
  • [2] Query-Driven Approach to Entity Resolution
    Altwaijry, Hotham
    Kalashnikov, Dmitri V.
    Mehrotra, Sharad
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1846 - 1857
  • [3] QDA: A Query-Driven Approach to Entity Resolution
    Altwaijry, Hotham
    Kalashnikov, Dmitri V.
    Mehrotra, Sharad
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (02) : 402 - 417
  • [4] HiDER: Query-Driven Entity Resolution for Historical Data
    Ranjbar-Sahraei, Bijan
    Efremova, Julia
    Rahmani, Hossein
    Calders, Toon
    Tuyls, Karl
    Weiss, Gerhard
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2015, 9286 : 281 - 284
  • [5] Query-Driven Program Testing
    Holzer, Andreas
    Schallhart, Christian
    Tautschnig, Michael
    Veith, Helmut
    VERIFICATION, MODEL CHECKING, AND ABSTRACT INTERPRETATION, 2009, 5403 : 151 - 166
  • [6] Query-driven Constraint Acquisition
    Bessiere, Christian
    Coletta, Remi
    O'Sullivan, Barry
    Paulin, Mathias
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 50 - 55
  • [7] Query-Driven Graph Processing
    Bonifati, Angela
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 311 - 312
  • [8] A Query-Driven Topic Model
    Fang, Zheng
    He, Yulan
    Procter, Rob
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1764 - 1777
  • [9] Query-driven Qualitative Constraint Acquisition
    Belaid, Mohamed-Bachir
    Belmecheri, Nassim
    Gotlieb, Arnaud
    Lazaar, Nadjib
    Spieker, Helge
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 79 : 241 - 271
  • [10] Snicket: Query-Driven Distributed Tracing
    Berg, Jessica
    Ruffy, Fabian
    Khanh Nguyen
    Yang, Nicholas
    Kim, Taegyun
    Sivaraman, Anirudh
    Netravali, Ravi
    Narayana, Srinivas
    PROCEEDINGS OF THE THE 20TH ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2021, 2021, : 206 - 212