Privacy-Preserving Record Linkage for Cardinality Counting

被引:2
|
作者
Wu, Nan [1 ]
Vatsalan, Dinusha [2 ]
Kaafar, Mohamed Ali [2 ]
Ramesh, Sanath Kumar [3 ]
机构
[1] Macquarie Univ, CSIROs Data61, Sydney, Australia
[2] Macquarie Univ, Sydney, Australia
[3] CuresDev LLC, OpenTreatments Fdn, San Jose, CA USA
关键词
Probabilistic counting; distinct-counting; fuzzy matching; Bloom filters; unsupervised learning; differential privacy;
D O I
10.1145/3579856.3590338
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several applications require counting the number of distinct items in the data, which is known as the cardinality counting problem. Example applications include health applications such as rare disease patients counting for adequate awareness and funding, and counting the number of cases of a new disease for outbreak detection, marketing applications such as counting the visibility reached for a new product, and cybersecurity applications such as tracking the number of unique views of social media posts. The data needed for the counting is however often personal and sensitive, and need to be processed using privacy-preserving techniques. The quality of data in different databases, for example typos, errors and variations, poses additional challenges for accurate cardinality estimation. While privacy-preserving cardinality counting has gained much attention in the recent times and a few privacy-preserving algorithms have been developed for cardinality estimation, no work has so far been done on privacy-preserving cardinality counting using record linkage techniques with fuzzy matching and provable privacy guarantees. We propose a novel privacy-preserving record linkage algorithm using unsupervised clustering techniques to link and count the cardinality of individuals in multiple datasets without compromising their privacy or identity. In addition, existing Elbow methods to find the optimal number of clusters as the cardinality are far from accurate as they do not take into account the purity and completeness of generated clusters. We propose a novel method to find the optimal number of clusters in unsupervised learning. Our experimental results on real and synthetic datasets are highly promising in terms of significantly smaller error rate of less than 0.1 with a privacy budget epsilon = 1.0 compared to the state-of-the-art fuzzy matching and clustering method.
引用
收藏
页码:53 / 64
页数:12
相关论文
共 50 条
  • [31] Semantic privacy-preserving framework for electronic health record linkage
    Lu, Yang
    Sinnott, Richard O.
    [J]. TELEMATICS AND INFORMATICS, 2018, 35 (04) : 737 - 752
  • [32] Hyper-Parameter Optimization for Privacy-Preserving Record Linkage
    Yu, Joyce
    Nabaglo, Jakub
    Vatsalan, Dinusha
    Henecka, Wilko
    Thorne, Brian
    [J]. ECML PKDD 2020 WORKSHOPS, 2020, 1323 : 281 - 296
  • [33] An enhanced privacy-preserving record linkage approach for multiple databases
    Shumin Han
    Derong Shen
    Tiezheng Nie
    Yue Kou
    Ge Yu
    [J]. Cluster Computing, 2022, 25 : 3641 - 3652
  • [34] Optimization of the Mainzelliste software for fast privacy-preserving record linkage
    Florens Rohde
    Martin Franke
    Ziad Sehili
    Martin Lablans
    Erhard Rahm
    [J]. Journal of Translational Medicine, 19
  • [35] Privacy-Preserving Access Control in Electronic Health Record Linkage
    Lu, Yang
    Sinnott, Richard O.
    Verspoor, Kain
    Parampalli, Udaya
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (IEEE TRUSTCOM) / 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (IEEE BIGDATASE), 2018, : 1079 - 1090
  • [36] Efficient Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage
    Christen, Peter
    Ranbaduge, Thilina
    Vatsalan, Dinusha
    Schnell, Rainer
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 628 - 640
  • [37] Optimization of the Mainzelliste software for fast privacy-preserving record linkage
    Rohde, Florens
    Franke, Martin
    Sehili, Ziad
    Lablans, Martin
    Rahm, Erhard
    [J]. JOURNAL OF TRANSLATIONAL MEDICINE, 2021, 19 (01)
  • [38] On the effectiveness of graph matching attacks against privacy-preserving record linkage
    Heng, Youzhe
    Armknecht, Frederik
    Chen, Yanling
    Schnell, Rainer
    [J]. PLOS ONE, 2022, 17 (09):
  • [39] FEDERAL: A Framework for Distance-Aware Privacy-Preserving Record Linkage
    Karapiperis, Dimitrios
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (02) : 292 - 304
  • [40] Privacy-Preserving Electronic Health Record Linkage Using Pseudonym Identifiers
    Alhaqbani, Bandar
    Fidge, Colin
    [J]. 2008 10TH IEEE INTERNATIONAL CONFERENCE ON E-HEALTH NETWORKING, APPLICATIONS AND SERVICES, 2008, : 108 - +