Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs)

被引:26
|
作者
Kwashie, Selasi [1 ]
Liu, Lin [1 ]
Liu, Jixue [1 ]
Stumptner, Markus [1 ]
Li, Jiuyong [1 ]
Yang, Lujing [1 ]
机构
[1] Univ South Australia, Sch ITMS, Adelaide, SA, Australia
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 06期
关键词
FUNCTIONAL DEPENDENCY; BLOCKING; SET;
D O I
10.14778/3311880.3311883
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the problem of accurately identifying multiple, differing, and possibly contradicting representations of unique real-world entities in data. It is a challenging and fundamental task in data cleansing and data integration. In this work, we propose graph differential dependencies (GDDs) as an extension of the recently developed graph entity dependencies (which are formal constraints for graph data) to enable approximate matching of values. Furthermore, we investigate a special discovery of GDDs for ER by designing an algorithm for generating a non-redundant set of GDDs in labelled data. Then, we develop an effective ER technique, Certus, that employs the learned GDDs for improving the accuracy of ER results. We perform extensive empirical evaluation of our proposals on five real-world ER benchmark datasets and a proprietary database to test their effectiveness and efficiency. The results from the experiments show the discovery algorithm and Certus are efficient; and more importantly, GDDs significantly improve the precision of ER without considerable trade-off of recall.
引用
收藏
页码:653 / 666
页数:14
相关论文
共 50 条
  • [1] An efficient approach for discovering Graph Entity Dependencies (GEDs)
    Liu, Dehua
    Kwashie, Selasi
    Zhang, Yidi
    Zhou, Guangtong
    Bewong, Michael
    Wu, Xiaoying
    Guo, Xi
    He, Keqing
    Feng, Zaiwen
    INFORMATION SYSTEMS, 2024, 125
  • [2] Discovering Graph Differential Dependencies
    Zhang, Yidi
    Kwashie, Selasi
    Bewong, Michael
    Hu, Junwei
    Mahboubi, Arash
    Guo, Xi
    Feng, Zaiwen
    DATABASES THEORY AND APPLICATIONS, ADC 2023, 2024, 14386 : 259 - 272
  • [3] GIG: Graph Data Imputation With Graph Differential Dependencies
    Hua, Jiang
    Bewong, Michael
    Kwashie, Selasi
    Rahman, Md Geaur
    Hui, Junwei
    Guo, Xi
    Feng, Zaiwen
    DATABASES THEORY AND APPLICATIONS, ADC 2024, 2025, 15449 : 347 - 358
  • [4] A Schema-Driven Synthetic Knowledge Graph Generation Approach With Extended Graph Differential Dependencies (GDDxs)
    Feng, Zaiwen
    Mayer, Wolfgang
    He, Keqing
    Kwashie, Selasi
    Stumptner, Markus
    Grossmann, Georg
    Peng, Rong
    Huang, Wangyu
    IEEE ACCESS, 2021, 9 : 5609 - 5639
  • [5] ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
    Bahmani, Zeinab
    Bertossi, Leopoldo
    Vasiloglou, Nikolaos
    SCALABLE UNCERTAINTY MANAGEMENT (SUM 2015), 2015, 9310 : 399 - 414
  • [6] ERBlox: Combining matching dependencies with machine learning for entity resolution
    Bahmani, Zeinab
    Bertossi, Leopoldo
    Vasiloglou, Nikolaos
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2017, 83 : 118 - 141
  • [7] ERBlox: Combining matching dependencies with machine learning for entity resolution
    Bahmani, Zeinab
    Bertossi, Leopoldo
    Vasiloglou, Nikolaos
    International Journal of Approximate Reasoning, 2017, 83 : 118 - 141
  • [8] BEER: Blocking for Effective Entity Resolution
    Galhotra, Sainyam
    Firmani, Donatella
    Saha, Barna
    Srivastava, Divesh
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2711 - 2715
  • [9] Effective Explanations for Entity Resolution Models
    Teofili, Tommaso
    Firmani, Donatella
    Koudas, Nick
    Martello, Vincenzo
    Merialdo, Paolo
    Srivastava, Divesh
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2709 - 2721
  • [10] Entity Resolution with Hierarchical Graph Attention Networks
    Yao, Dezhong
    Gu, Yuhong
    Cong, Gao
    Jin, Hai
    Lv, Xinqiao
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 429 - 442