Privacy-Preserving Record Linkage with Spark

被引:0
|
作者
Valkering, Onno [1 ]
Belloum, Adam [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
linkage; PPRL; privacy; scalability; Spark; HASH;
D O I
10.1109/CCGRID.2019.000.58
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy considerations obligate careful and secure processing of personal data. This is especially true when personal data is linked against databases from other organizations. During such endeavours, privacy-preserving record linkage (PPRL) can be utilized to prevent needless exposure of sensitive information to other organizations. With the increase of personal data that is being gathered and analyzed, scalable PPRL capable of handling massive databases is much desired. In this work, we evaluate Apache Spark as an option to scale PPRL. Not only is it valuable to have a scalable PPRL implementation, but one based on the Spark would also be commonly deployable and could take advantage of further development of the ecosystem. Our results show that a PPRL solution based on Spark outperforms alternatives when it conies to handling multiple millions of records; can scale to dozens of nodes; and is on-par with regular record linkage implementations in terms of achieved results.
引用
收藏
页码:440 / 448
页数:9
相关论文
共 50 条
  • [1] Privacy-preserving record linkage
    Verykios, Vassilios S.
    Christen, Peter
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 321 - 332
  • [2] Privacy-Preserving Record Linkage
    Hall, Rob
    Fienberg, Stephen E.
    [J]. PRIVACY IN STATISTICAL DATABASES, 2010, 6344 : 269 - +
  • [3] Privacy-Preserving Temporal Record Linkage
    Ranbaduge, Thilina
    Christen, Peter
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 377 - 386
  • [4] Privacy-preserving record linkage using autoencoders
    Victor Christen
    Tim Häntschel
    Peter Christen
    Erhard Rahm
    [J]. International Journal of Data Science and Analytics, 2023, 15 : 347 - 357
  • [5] A taxonomy of privacy-preserving record linkage techniques
    Vatsalan, Dinusha
    Christen, Peter
    Verykios, Vassilios S.
    [J]. INFORMATION SYSTEMS, 2013, 38 (06) : 946 - 969
  • [6] Privacy-preserving record linkage using autoencoders
    Christen, Victor
    Haentschel, Tim
    Christen, Peter
    Rahm, Erhard
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 15 (04) : 347 - 357
  • [7] Privacy-Preserving Record Linkage for Cardinality Counting
    Wu, Nan
    Vatsalan, Dinusha
    Kaafar, Mohamed Ali
    Ramesh, Sanath Kumar
    [J]. PROCEEDINGS OF THE 2023 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ASIA CCS 2023, 2023, : 53 - 64
  • [8] Towards Privacy-Preserving Record Linkage with Record-Wise Linkage Policy
    Kaiho, Takahito
    Lu, Wen-jie
    Amagasa, Toshiyuki
    Sakuma, Jun
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 233 - 248
  • [10] A scalable privacy-preserving framework for temporal record linkage
    Ranbaduge, Thilina
    Christen, Peter
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (01) : 45 - 78