Active Blocking Scheme Learning for Entity Resolution

被引:3
|
作者
Shao, Jingyu [1 ]
Wang, Qing [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Entity resolution; Blocking scheme; Active learning; RECORD LINKAGE;
D O I
10.1007/978-3-319-93037-4_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Blocking is an important part of entity resolution. It aims to improve time efficiency by grouping potentially matched records into the same block. In the past, both supervised and unsupervised approaches have been proposed. Nonetheless, existing approaches have some limitations: either a large amount of labels are required or blocking quality is hard to be guaranteed. To address these issues, we propose a blocking scheme learning approach based on active learning techniques. With a limited label budget, our approach can learn a blocking scheme to generate high quality blocks. Two strategies called active sampling and active branching are proposed to select samples and generate blocking schemes efficiently. We experimentally verify that our approach outperforms several baseline approaches over four real-world datasets.
引用
收藏
页码:350 / 362
页数:13
相关论文
共 50 条
  • [21] Efficient Spectral Neighborhood Blocking for Entity Resolution
    Shu, Liangcai
    Chen, Aiyou
    Xiong, Ming
    Meng, Weiyi
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1067 - 1078
  • [22] Blocking and Filtering Techniques for Entity Resolution: A Survey
    Papadakis, George
    Skoutas, Dimitrios
    Thanos, Emmanouil
    Palpanas, Themis
    ACM COMPUTING SURVEYS, 2020, 53 (02)
  • [23] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1468 - 1469
  • [24] MFIBlocks: An effective blocking algorithm for entity resolution
    Kenig, Batya
    Gal, Avigdor
    INFORMATION SYSTEMS, 2013, 38 (06) : 908 - 926
  • [25] Active in-context learning for cross-domain entity resolution
    Zhang, Ziheng
    Zeng, Weixin
    Tang, Jiuyang
    Huang, Hongbin
    Zhao, Xiang
    INFORMATION FUSION, 2025, 117
  • [26] Low-resource entity resolution with domain generalization and active learning
    Xu, Zhihong
    Wang, Ning
    NEUROCOMPUTING, 2024, 599
  • [27] Heterogeneous Committee-Based Active Learning for Entity Resolution (HeALER)
    Chen, Xiao
    Xu, Yinlong
    Broneske, David
    Durand, Gabriel Campero
    Zoun, Roman
    Saake, Gunter
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 69 - 85
  • [28] Low-resource Deep Entity Resolution with Transfer and Active Learning
    Kasai, Jungo
    Qian, Kun
    Gurajada, Sairam
    Li, Yunyao
    Popa, Lucian
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5851 - 5861
  • [29] Comparative Analysis of Approximate Blocking Techniques for Entity Resolution
    Papadakis, George
    Svirsky, Jonathan
    Gal, Avigdor
    Palpanas, Themis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (09): : 684 - 695
  • [30] Improved suffix blocking for record linkage and entity resolution
    Allam, Amin
    Skiadopoulos, Spiros
    Kalnis, Panos
    DATA & KNOWLEDGE ENGINEERING, 2018, 117 : 98 - 113