Active Blocking Scheme Learning for Entity Resolution

被引:3
|
作者
Shao, Jingyu [1 ]
Wang, Qing [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Entity resolution; Blocking scheme; Active learning; RECORD LINKAGE;
D O I
10.1007/978-3-319-93037-4_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Blocking is an important part of entity resolution. It aims to improve time efficiency by grouping potentially matched records into the same block. In the past, both supervised and unsupervised approaches have been proposed. Nonetheless, existing approaches have some limitations: either a large amount of labels are required or blocking quality is hard to be guaranteed. To address these issues, we propose a blocking scheme learning approach based on active learning techniques. With a limited label budget, our approach can learn a blocking scheme to generate high quality blocks. Two strategies called active sampling and active branching are proposed to select samples and generate blocking schemes efficiently. We experimentally verify that our approach outperforms several baseline approaches over four real-world datasets.
引用
收藏
页码:350 / 362
页数:13
相关论文
共 50 条
  • [41] The role of transitive closure in evaluating blocking methods for dirty entity resolution
    Niknam, Mahdi
    Minaei-Bidgoli, Behrouz
    Dianat, Rouhollah
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2022, 58 (03) : 561 - 590
  • [42] The role of transitive closure in evaluating blocking methods for dirty entity resolution
    Mahdi Niknam
    Behrouz Minaei-Bidgoli
    Rouhollah Dianat
    Journal of Intelligent Information Systems, 2022, 58 : 561 - 590
  • [43] Unsupervised Blocking Key Selection for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 574 - 585
  • [44] Leveraging active learning to reduce human effort in the generation of ground-truth for entity resolution
    de Araujo, Diego Fernandes
    Santos Pires, Carlos Eduardo
    Nascimento, Dimas Cassimiro
    COMPUTATIONAL INTELLIGENCE, 2020, 36 (02) : 743 - 772
  • [45] A Noise Tolerant and Schema-agnostic Blocking Technique for Entity Resolution
    Araujo, Tiago Brasileiro
    Santos Pires, Carlos Eduardo
    Mestre, Demetrio Gomes
    da Nobrega, Thiago Pereira
    do Nascimento, Dimas Cassimiro
    Stefanidis, Kostas
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 422 - 430
  • [46] A Deep-Learning-Based Blocking Technique for Entity Linkage
    Azzalini, Fabio
    Renzi, Marco
    Tanca, Letizia
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 553 - 569
  • [47] Deep Learning for Blocking in Entity Matching: A Design Space Exploration
    Thirumuruganathan, Saravanan
    Li, Han
    Tang, Nan
    Ouzzani, Mourad
    Govind, Yash
    Paulsen, Derek
    Fung, Glenn
    Doan, AnHai
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (11): : 2459 - 2472
  • [48] Learning representations of Web entities for entity resolution
    Barbosa, Luciano
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (03) : 346 - 358
  • [49] Learning Entity and Relation Embeddings for Knowledge Resolution
    Lin, Hailun
    Liu, Yong
    Wang, Weiping
    Yue, Yinliang
    Lin, Zheng
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 345 - 354
  • [50] Cost-effective Variational Active Entity Resolution
    Bogatu, Alex
    Paton, Norman W.
    Douthwaite, Mark
    Davie, Stuart
    Freitas, Andre
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1272 - 1283