Robust Temporal Graph Clustering for Group Record Linkage

被引:4
|
作者
Nanayakkara, Charini [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 2600, Australia
基金
澳大利亚研究理事会;
关键词
Entity resolution; Star clustering; Vital records; Birth bundling;
D O I
10.1007/978-3-030-16145-3_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains need to be linked to allow advanced analytics. A popular type of data used in such a context are historical registries containing birth, death, and marriage certificates. Individually, such data sets however limit the types of studies that can be conducted. Specifically, it is impossible to track individuals, families, or households over time. Once such data sets are linked and family trees are available it is possible to, for example, investigate how education, health, mobility, and employment influence the lives of people over two or even more generations. The linkage of historical records is challenging because of data quality issues and because often there are no ground truth data available. Unsupervised techniques need to be employed, which generally are based on similarity graphs generated by comparing individual records. In this paper we present a novel temporal clustering approach aimed at linking records of the same group (such as all births by the same mother) where temporal constraints (such as intervals between births) need to be enforced. We combine a connected component approach with an iterative merging step which considers temporal constraints to obtain accurate clustering results. Experiments on a real Scottish data set show the superiority of our approach over a previous clustering approach for record linkage.
引用
收藏
页码:526 / 538
页数:13
相关论文
共 50 条
  • [1] Record Linkage Using Graph Consistency
    Schraagen, Marijn
    Kosters, Walter
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2014, 2014, 8556 : 471 - 483
  • [2] Document clustering as a record linkage problem
    Pittaras, Nikiforos
    Giannakopoulos, George
    Tsekouras, Leonidas
    Varlamis, Iraklis
    [J]. PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [3] Robust Group Linkage
    Li, Pei
    Dong, Xin Luna
    Guo, Songtao
    Maurino, Andrea
    Srivastava, Divesh
    [J]. PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, : 647 - 657
  • [4] Graph Degree Linkage: Agglomerative Clustering on a Directed Graph
    Zhang, Wei
    Wang, Xiaogang
    Zhao, Deli
    Tang, Xiaoou
    [J]. COMPUTER VISION - ECCV 2012, PT I, 2012, 7572 : 428 - 441
  • [5] Robust sequential subspace clustering via 1-norm temporal graph
    Hu, Wenyu
    Li, Shenghao
    Zheng, Weidong
    Lu, Yao
    Yu, Gaohang
    [J]. Neurocomputing, 2022, 383 : 380 - 395
  • [6] Robust optimal graph clustering
    Wang, Fei
    Zhu, Lei
    Liang, Cheng
    Li, Jingjing
    Chang, Xiaojun
    Lu, Ke
    [J]. NEUROCOMPUTING, 2020, 378 : 153 - 165
  • [7] Learning robust graph for clustering
    Liu, Zheng
    Jin, Wei
    Mu, Ying
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (10) : 7736 - 7766
  • [8] Robust Structured Graph Clustering
    Shi, Dan
    Zhu, Lei
    Li, Yikun
    Li, Jingjing
    Nie, Xiushan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4424 - 4436
  • [9] Efficient Record Linkage Algorithms Using Complete Linkage Clustering
    Mamun, Abdullah-Al
    Aseltine, Robert
    Rajasekaran, Sanguthevar
    [J]. PLOS ONE, 2016, 11 (04):
  • [10] Knowledge graph based methods for record linkage
    Gautam B.
    Ramos Terrades O.
    Pujadas-Mora J.M.
    Valls M.
    [J]. Ramos Terrades, Oriol (oriolrt@cvc.uab.cat), 2020, Elsevier B.V., Netherlands (136) : 127 - 133