Query-Driven Sampling for Collective Entity Resolution

被引:2
|
作者
Grant, Christan [1 ]
Wang, Daisy Zhe [2 ]
Wick, Michael [3 ]
机构
[1] Univ Oklahoma, Norman, OK 73019 USA
[2] Univ Florida, Gainesville, FL 32611 USA
[3] Univ Massachusetts, Amherst, MA 01003 USA
关键词
D O I
10.1109/IRI.2016.34
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Entity Resolution is the process of determining records (mentions) in a database that correspond to the same real-world entity. Traditional pairwise ER methods can lead to inconsistencies and low accuracy due to localized decisions. Leading ER systems solve this problem by collectively resolving all records using a probabilistic graphical model and Markov chain Monte Carlo (MCMC) inference. However, for large datasets this is an extremely expensive process. One key observation is that, such exhaustive ER process incurs a huge up-front cost, which is wasteful in practice because most users are interested in only a small subset of entities. In this paper, we advocate pay-as-you-go entity resolution by developing a number of query-driven collective ER techniques. We introduce two classes of SQL queries that involve ER operators - selection-driven ER and join-driven ER. We implement novel variations of the MCMC Metropolis Hastings algorithm to generate biased samples and selectivity-based scheduling algorithms to support the two classes of ER queries. Finally, we show that query-driven ER algorithms can converge and return results within minutes over a database populated with the extraction from a newswire dataset containing 71 million mentions.
引用
收藏
页码:208 / 217
页数:10
相关论文
共 50 条
  • [41] Query-Driven Procedures for Hybrid MKNF Knowledge Bases
    Alferes, Jose Julio
    Knorr, Matthias
    Swift, Terrance
    ACM TRANSACTIONS ON COMPUTATIONAL LOGIC, 2013, 14 (02)
  • [42] A Conceptual Query-Driven Design Framework for Data Warehouse
    Nair, Resmi
    Wilson, Campbell
    Srinivasan, Bala
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 19, 2007, 19 : 141 - 146
  • [43] Query-driven Edge Node Selection in Distributed Learning Environments
    Aladwani, Tahani
    Anagnostopoulos, Christos
    Kolomvatsos, Kostas
    Alghamdi, Ibrahim
    Deligianni, Fani
    2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW, 2023, : 146 - 153
  • [44] RouPar: Routinely and Mixed Query-Driven Approach for Data Partitioning
    Bellatreche, Ladjel
    Kerkad, Amira
    Bress, Sebastian
    Geniet, Dominique
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 CONFERENCES, 2013, 8185 : 309 - 326
  • [45] Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
    Pak, Byeonghyun
    Woo, Byeongju
    Kim, Sunghwan
    Kim, Dae-hwan
    Kim, Hoseong
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 37 - 54
  • [46] Accelerating network traffic analytics using query-driven visualization
    Bethel, E. Wes
    Campbell, Scott
    Dart, Eli
    Stockinger, Kurt
    Wu, Kesheng
    VAST 2006: IEEE SYMPOSIUM ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY, PROCEEDINGS, 2006, : 115 - +
  • [47] Query-Driven Learning for Next Generation Predictive Modeling & Analytics
    Savva, Fotis
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1844 - 1846
  • [48] Query-driven Multiscale Data Postprocessing in Computational Fluid Dynamics
    Atanasov, Atanas
    Weinzierl, Tobias
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 332 - 341
  • [49] Query-Driven Tracing for a Multiuser Scenario in Wireless Sensor Networks
    Liu, Xingcheng
    Cao, Xutao
    Gong, Xinren
    Ma, Zaili
    IEEE SENSORS JOURNAL, 2013, 13 (08) : 3016 - 3024
  • [50] Query-driven Generative Network for Document Information Extraction in the Wild
    Cao, Haoyu
    Li, Xin
    Ma, Jiefeng
    Jiang, Deqiang
    Guo, Antai
    Hu, Yiqing
    Liu, Hao
    Liu, Yinsong
    Ren, Bo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4261 - 4271