Privacy-Preserving Record Linkage for Cardinality Counting

被引:2
|
作者
Wu, Nan [1 ]
Vatsalan, Dinusha [2 ]
Kaafar, Mohamed Ali [2 ]
Ramesh, Sanath Kumar [3 ]
机构
[1] Macquarie Univ, CSIROs Data61, Sydney, Australia
[2] Macquarie Univ, Sydney, Australia
[3] CuresDev LLC, OpenTreatments Fdn, San Jose, CA USA
关键词
Probabilistic counting; distinct-counting; fuzzy matching; Bloom filters; unsupervised learning; differential privacy;
D O I
10.1145/3579856.3590338
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several applications require counting the number of distinct items in the data, which is known as the cardinality counting problem. Example applications include health applications such as rare disease patients counting for adequate awareness and funding, and counting the number of cases of a new disease for outbreak detection, marketing applications such as counting the visibility reached for a new product, and cybersecurity applications such as tracking the number of unique views of social media posts. The data needed for the counting is however often personal and sensitive, and need to be processed using privacy-preserving techniques. The quality of data in different databases, for example typos, errors and variations, poses additional challenges for accurate cardinality estimation. While privacy-preserving cardinality counting has gained much attention in the recent times and a few privacy-preserving algorithms have been developed for cardinality estimation, no work has so far been done on privacy-preserving cardinality counting using record linkage techniques with fuzzy matching and provable privacy guarantees. We propose a novel privacy-preserving record linkage algorithm using unsupervised clustering techniques to link and count the cardinality of individuals in multiple datasets without compromising their privacy or identity. In addition, existing Elbow methods to find the optimal number of clusters as the cardinality are far from accurate as they do not take into account the purity and completeness of generated clusters. We propose a novel method to find the optimal number of clusters in unsupervised learning. Our experimental results on real and synthetic datasets are highly promising in terms of significantly smaller error rate of less than 0.1 with a privacy budget epsilon = 1.0 compared to the state-of-the-art fuzzy matching and clustering method.
引用
收藏
页码:53 / 64
页数:12
相关论文
共 50 条
  • [21] A Vulnerability Assessment Framework for Privacy-preserving Record Linkage
    Vidanage, Anushka
    Christen, Peter
    Ranbaduge, Thilina
    Schnell, Rainer
    [J]. ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2023, 26 (03)
  • [22] Securing Bloom Filters for Privacy-preserving Record Linkage
    Ranbaduge, Thilina
    Schnell, Rainer
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2185 - 2188
  • [23] Fairness-Aware Privacy-Preserving Record Linkage
    Vatsalan, Dinusha
    Yu, Joyce
    Henecka, Wilko
    Thorne, Brian
    [J]. DATA PRIVACY MANAGEMENT, CRYPTOCURRENCIES AND BLOCKCHAIN TECHNOLOGY, ESORICS 2020, DPM 2020, CBT 2020, 2020, 12484 : 3 - 18
  • [24] Accurate privacy-preserving record linkage for databases with missing values
    Vaiwsri, Sirintra
    Ranbaduge, Thilina
    Christen, Peter
    Schnell, Rainer
    [J]. INFORMATION SYSTEMS, 2022, 106
  • [25] Privacy-Preserving Record Linkage via Bilinear Pairing Approach
    Lin, Chih-Hsun
    Yu, Chia-Mu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2018,
  • [26] Secure Approximate String Matching for Privacy-Preserving Record Linkage
    Essex, Aleksander
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2019, 14 (10) : 2623 - 2632
  • [27] Semantic privacy-preserving framework for electronic health record linkage
    Lu, Yang
    Sinnott, Richard O.
    [J]. TELEMATICS AND INFORMATICS, 2018, 35 (04) : 737 - 752
  • [28] Differential Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage
    Yin, Weifeng
    Yuan, Lifeng
    Ren, Yizhi
    Meng, Weizhi
    Wang, Dong
    Wang, Qiuhua
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 6665 - 6678
  • [29] An enhanced privacy-preserving record linkage approach for multiple databases
    Han, Shumin
    Shen, Derong
    Nie, Tiezheng
    Kou, Yue
    Yu, Ge
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (05): : 3641 - 3652
  • [30] An Overview of Big Data Issues in Privacy-Preserving Record Linkage
    Vatsalan, Dinusha
    Karapiperis, Dimitrios
    Gkoulalas-Divanis, Aris
    [J]. ALGORITHMIC ASPECTS OF CLOUD COMPUTING (ALGOCLOUD 2018), 2019, 11409 : 118 - 136