Robust Temporal Graph Clustering for Group Record Linkage

被引:4
|
作者
Nanayakkara, Charini [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 2600, Australia
基金
澳大利亚研究理事会;
关键词
Entity resolution; Star clustering; Vital records; Birth bundling;
D O I
10.1007/978-3-030-16145-3_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains need to be linked to allow advanced analytics. A popular type of data used in such a context are historical registries containing birth, death, and marriage certificates. Individually, such data sets however limit the types of studies that can be conducted. Specifically, it is impossible to track individuals, families, or households over time. Once such data sets are linked and family trees are available it is possible to, for example, investigate how education, health, mobility, and employment influence the lives of people over two or even more generations. The linkage of historical records is challenging because of data quality issues and because often there are no ground truth data available. Unsupervised techniques need to be employed, which generally are based on similarity graphs generated by comparing individual records. In this paper we present a novel temporal clustering approach aimed at linking records of the same group (such as all births by the same mother) where temporal constraints (such as intervals between births) need to be enforced. We combine a connected component approach with an iterative merging step which considers temporal constraints to obtain accurate clustering results. Experiments on a real Scottish data set show the superiority of our approach over a previous clustering approach for record linkage.
引用
下载
收藏
页码:526 / 538
页数:13
相关论文
共 50 条
  • [21] Learnable similarity functions and their applications to clustering and record linkage
    Bilenko, M
    PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 981 - 982
  • [22] On the evaluation of record linkage: a proposal using fuzzy clustering
    Torra, Vicenc
    Jimenez, Javier
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 592 - 596
  • [24] Hierarchical Linkage Clustering with Distributions of Distances for Large-Scale Record Linkage
    Ventura, Samuel L.
    Nugent, Rebecca
    PRIVACY IN STATISTICAL DATABASES, PSD 2014, 2014, 8744 : 283 - 298
  • [25] Robust Dynamic Clustering for Temporal Networks
    You, Jingyi
    Hu, Chenlong
    Kamigaito, Hidetaka
    Funakoshi, Kotaro
    Okumura, Manabu
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2424 - 2433
  • [26] Robust clustering with adaptive order graph learning
    Tang, Jiayi
    Gao, Yan
    Jia, Suqi
    Feng, Hui
    INFORMATION SCIENCES, 2023, 649
  • [27] Improving Temporal Record Linkage Using Regression Classification
    Hu, Yichen
    Wang, Qing
    Vatsalan, Dinusha
    Christen, Peter
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 561 - 573
  • [28] Efficient Sequential and Parallel Algorithms for Incremental Record Linkage Using Complete Linkage Clustering
    Baihan, Abdullah
    Rajasekaran, Sanguthevar
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 926 - 930
  • [29] Use of graph theory measures to identify errors in record linkage
    Randall, Sean M.
    Boyd, James H.
    Ferrante, Anna M.
    Bauer, Jacqueline K.
    Semmens, James B.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 115 (02) : 55 - 63
  • [30] A Graph Matching Attack on Privacy-Preserving Record Linkage
    Vidanage, Anushka
    Christen, Peter
    Ranbaduge, Thilina
    Schnell, Rainer
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1485 - 1494