Re-identification of individuals in genomic data-sharing beacons via allele inference

被引:36
|
作者
von Thenen, Nora [1 ]
Ayday, Erman [1 ,2 ]
Cicek, A. Ercument [1 ,3 ]
机构
[1] Bilkent Univ, Comp Engn Dept, TR-06800 Ankara, Turkey
[2] Case Western Reserve Univ, Dept Elect Engn & Comp Sci, Cleveland, OH 44106 USA
[3] Carnegie Mellon Univ, Sch Comp Sci, Computat Biol Dept, Pittsburgh, PA 15213 USA
关键词
PRIVACY;
D O I
10.1093/bioinformatics/bty643
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Genomic data-sharing beacons aim to provide a secure, easy to implement and standardized interface for data-sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. Previously deemed secure against re-identification attacks, beacons were shown to be vulnerable despite their stringent policy. Recent studies have demonstrated that it is possible to determine whether the victim is in the dataset, by repeatedly querying the beacon for his/her single-nucleotide polymorphisms (SNPs). Here, we propose a novel re-identification attack and show that the privacy risk is more serious than previously thought. Results: Using the proposed attack, even if the victim systematically hides informative SNPs, it is possible to infer the alleles at positions of interest as well as the beacon query results with very high confidence. Our method is based on the fact that alleles at different loci are not necessarily independent. We use linkage disequilibrium and a high-order Markov chain-based algorithm for inference. We show that in a simulated beacon with 65 individuals from the European population, we can infer membership of individuals with 95% confidence with only 5 queries, even when SNPs with MAF <0.05 are hidden. We need less than 0.5% of the number of queries that existing works require, to determine beacon membership under the same conditions. We show that countermeasures such as hiding certain parts of the genome or setting a query budget for the user would fail to protect the privacy of the participants.
引用
收藏
页码:365 / 371
页数:7
相关论文
共 50 条
  • [1] The effect of kinship in re-identification attacks against genomic data sharing beacons
    Ayoz, Kerem
    Aysen, Miray
    Ayday, Erman
    Cicek, A. Ercument
    [J]. BIOINFORMATICS, 2020, 36 : I903 - I910
  • [2] Privacy Risks from Genomic Data-Sharing Beacons
    Shringarpure, Suyash S.
    Bustamante, Carlos D.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2015, 97 (05) : 631 - 646
  • [3] Genomic Data-Sharing Practices
    Villanueva, Angela G.
    Cook-Deegan, Robert
    Robinson, Jill O.
    McGuire, Amy L.
    Majumder, Mary A.
    [J]. JOURNAL OF LAW MEDICINE & ETHICS, 2019, 47 (01): : 31 - 40
  • [4] Between Minimal and Greater Than Minimal Risk: How Research Participants and Oncologists Assess Data-Sharing and the Risk of Re-identification in Genomic Research
    Schleidgen S.
    Husedzinovic A.
    Ose D.
    Schickhardt C.
    von Kalle C.
    Winkler E.C.
    [J]. Philosophy & Technology, 2019, 32 (1) : 39 - 55
  • [5] How cancer patients and oncologist assess data-sharing and the risk of re-identification in genomic research? Ethical implications for informed consent and governance.
    Winkler, Eva Caroline
    Schleidgen, Sebastian
    Schickhardt, Christoph
    Kalle, Christof V.
    Ose, Dominik
    Husedzinovic, Alma
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2016, 34 (15)
  • [6] Genomic data-sharing: what will be our legacy?
    Callier, Shawneequa
    Husain, Rajah
    Simpson, Rachel
    [J]. FRONTIERS IN GENETICS, 2014, 5
  • [7] Re-identification of individuals in genomic datasets using public face images
    Venkatesaramani, Rajagopal
    Malin, Bradley A.
    Vorobeychik, Yevgeniy
    [J]. SCIENCE ADVANCES, 2021, 7 (47):
  • [8] Federated discovery and sharing of genomic data using Beacons
    Fiume, Marc
    Cupak, Miroslav
    Keenan, Stephen
    Rambla, Jordi
    de la Torre, Sabela
    Dyke, Stephanie O. M.
    Brookes, Anthony J.
    Carey, Knox
    Lloyd, David
    Goodhand, Peter
    Haeussler, Maximilian
    Baudis, Michael
    Stockinger, Heinz
    Dolman, Lena
    Lappalainen, Ilkka
    Tornroos, Juha
    Linden, Mikael
    Spalding, J. Dylan
    Ur-Rehman, Saif
    Page, Angela
    Flicek, Paul
    Sherry, Stephen
    Haussler, David
    Varma, Susheel
    Saunders, Gary
    Scollen, Serena
    [J]. NATURE BIOTECHNOLOGY, 2019, 37 (03) : 220 - 224
  • [9] Federated discovery and sharing of genomic data using Beacons
    Marc Fiume
    Miroslav Cupak
    Stephen Keenan
    Jordi Rambla
    Sabela de la Torre
    Stephanie O. M. Dyke
    Anthony J. Brookes
    Knox Carey
    David Lloyd
    Peter Goodhand
    Maximilian Haeussler
    Michael Baudis
    Heinz Stockinger
    Lena Dolman
    Ilkka Lappalainen
    Juha Törnroos
    Mikael Linden
    J. Dylan Spalding
    Saif Ur-Rehman
    Angela Page
    Paul Flicek
    Stephen Sherry
    David Haussler
    Susheel Varma
    Gary Saunders
    Serena Scollen
    [J]. Nature Biotechnology, 2019, 37 : 220 - 224
  • [10] Responsible Data Sharing: Identifying and Remedying Possible Re-Identification of Human Participants
    Morehouse, Kirsten N.
    Kurdi, Benedek
    Nosek, Brian A.
    [J]. AMERICAN PSYCHOLOGIST, 2024,