Efficient privacy-preserving whole-genome variant queries

被引:4
|
作者
Akguen, Mete [1 ,2 ,3 ,4 ]
Pfeifer, Nico [2 ,5 ,6 ]
Kohlbacher, Oliver [2 ,3 ,7 ]
机构
[1] Univ Tubingen, Dept Comp Sci, Med Data Privacy & Privacy Preserving ML Healthca, Tubingen, Germany
[2] Univ Tubingen, Inst Bioinformat & Med Informat, Tubingen, Germany
[3] Univ Hosp Tubingen, Translat Bioinformat, Tubingen, Germany
[4] Izmir Inst Technol, Dept Comp Engn, Izmir, Turkey
[5] Univ Tubingen, Dept Comp Sci, Methods Med Informat, Tubingen, Germany
[6] Max Planck Inst Informat, Stat Learning Computat Biol, Saarbrucken, Germany
[7] Univ Tubingen, Appl Bioinformat, Dept Comp Sci, Tubingen, Germany
关键词
D O I
10.1093/bioinformatics/btac070
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease-gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data. Results: We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data.
引用
收藏
页码:2202 / 2210
页数:9
相关论文
共 50 条
  • [1] Privacy-Preserving Whole-Genome Variant Queries
    Demmler, Daniel
    Hamacher, Kay
    Schneider, Thomas
    Stammler, Sebastian
    CRYPTOLOGY AND NETWORK SECURITY (CANS 2017), 2018, 11261 : 71 - 92
  • [2] Efficient and Privacy-Preserving Subgraph Matching Queries in Graph Federation
    Guan, Yunguo
    Lu, Rongxing
    Zhang, Songnian
    Lalla, Sean
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 2282 - 2287
  • [3] Privacy-Preserving Range Queries from Keyword Queries
    Di Crescenzo, Giovanni
    Ghosh, Abhrajit
    DATA AND APPLICATIONS SECURITY AND PRIVACY XXIX, 2015, 9149 : 35 - 50
  • [4] Privacy-preserving queries on encrypted data
    Yang, Zhiqiang
    Zhong, Sheng
    Wright, Rebecca N.
    Computer Security - ESORICS 2006, Proceedings, 2006, 4189 : 479 - 495
  • [5] Privacy-preserving top-k queries
    Vaidya, J
    Clifton, C
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 545 - 546
  • [6] Privacy-Preserving Queries over Relational Databases
    Olumofin, Femi
    Goldberg, Ian
    PRIVACY ENHANCING TECHNOLOGIES, 2010, 6205 : 75 - 92
  • [7] Hiding behind the Clouds: Efficient, Privacy-Preserving Queries via Cloud Proxies
    Gaur, Surabhi
    Moh, Melody
    Balakrishnan, Mahesh
    2013 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2013, : 488 - 493
  • [8] Towards Efficient Privacy-Preserving Similar Sequence Queries on Outsourced Genomic Databases
    Schneider, Thomas
    Tkachenko, Oleksandr
    PROCEEDINGS OF THE 2018 WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY (WPES'18), 2018, : 71 - 75
  • [9] EPISODE: Efficient Privacy-PreservIng Similar Sequence Queries on Outsourced Genomic DatabasEs
    Schneider, Thomas
    Tkachenko, Oleksandr
    PROCEEDINGS OF THE 2019 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIACCS '19), 2019, : 315 - 327
  • [10] Efficient Privacy-Preserving Range Queries over Encrypted Data in Cloud Computing
    Samanthula, Bharath K.
    Jiang, Wei
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 51 - 58