A Clustering-Based Framework for Incrementally Repairing Entity Resolution

被引:2
|
作者
Wang, Qing [1 ]
Gao, Jingyi [1 ]
Christen, Peter [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 0200, Australia
关键词
Data matching; Record linkage; Deduplication; Data provenance; Data repairing; Consistent clustering;
D O I
10.1007/978-3-319-31750-2_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although entity resolution ( ER) is known to be an important problem that has wide-spread applications in many areas, including e-commerce, health-care, social science, and crime and fraud detection, one aspect that has largely been neglected is to monitor the quality of entity resolution and repair erroneous matching decisions over time. In this paper we develop an efficient method for incrementally repairing ER, i.e., fix detected erroneous matches and non-matches. Our method is based on an efficient clustering algorithm that eliminates inconsistencies among matching decisions, and an efficient provenance indexing data structure that allows us to trace the evidence of clustering for supporting ER repairing. We have evaluated our method over real-world databases, and our experimental results show that the quality of entity resolution can be significantly improved through repairing over time.
引用
收藏
页码:283 / 295
页数:13
相关论文
共 50 条
  • [1] A Clustering-Based Framework to Control Block Sizes for Entity Resolution
    Fisher, Jeffrey
    Christen, Peter
    Wang, Qing
    Rahm, Erhard
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 279 - 288
  • [2] Clustering-based Inference for Biomedical Entity Linking
    Angell, Rico
    Monath, Nicholas
    Mohan, Sunil
    Yadav, Nishant
    McCallum, Andrew
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2598 - 2608
  • [3] REPLACE: A Logical Framework for Combining Collective Entity Resolution and Repairing
    Bienvenu, Meghyn
    Cima, Gianluca
    Gutierrez-Basulto, Victor
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3132 - 3139
  • [4] A Clustering-based Framework for Fast Training of Classifiers
    Sathyamoorthy, Sruthi
    Sivasankar, E.
    [J]. 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY (ICITIIT), 2020,
  • [5] A Clustering-based Framework for Classifying Data Streams
    Yan, Xuyang
    Homaifar, Abdollah
    Sarkar, Mrinmoy
    Girma, Abenezer
    Tunstel, Edward
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3257 - 3263
  • [6] Log Clustering-based Method for Repairing Missing Traces with Context Probability Information
    Fang, Huan
    Su, Wenjie
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 1445 - 1452
  • [7] Net Cluster: a Clustering-Based Framework for Internet Tomography
    Baralis, Elena
    Bianco, Andrea
    Cerquitelli, Tania
    Chiaraviglio, Luca
    Mellia, Marco
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 2288 - +
  • [8] Novel Clustering-Based Web Service Recommendation Framework
    Pandharbale, Priya Bhaskar
    Mohanty, Sachi Nandan
    Jagadev, Alok Kumar
    [J]. INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2022, 11 (05)
  • [9] Optimized clustering-based discovery framework on Internet of Things
    Bharti, Monika
    Jindal, Himanshu
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1739 - 1778
  • [10] GenMatcher: A Generic Clustering-Based Arbitrary Matching Framework
    Wang, Ping
    Mchale, Luke
    Gratz, Paul, V
    Sprintson, Alex
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 15 (04)