Secure large-scale genome data storage and query

被引:9
|
作者
Chen, Luyao [1 ,4 ]
Aziz, Md Momin [2 ,4 ]
Mohammed, Noman [2 ]
Jiang, Xiaoqian [3 ]
机构
[1] Carnegie Mellon Univ, Heinz Coll, Pittsburgh, PA 15213 USA
[2] Univ Manitoba, Comp Sci, Winnipeg, MB, Canada
[3] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Houston, TX 77030 USA
[4] Univ Calif San Diego, Dept Biomed Informat, La Jolla, CA 92093 USA
基金
美国国家卫生研究院; 加拿大自然科学与工程研究理事会;
关键词
Secure genome data storage; Graph database; Secure computation on genome data; Homomorphic encryption; Genome data storage Neo4j; SEQUENCING DATA;
D O I
10.1016/j.cmpb.2018.08.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and Objective: Cloud computing plays a vital role in big data science with its scalable and cost-efficient architecture. Large-scale genome data storage and computations would benefit from using these latest cloud computing infrastructures, to save cost and speedup discoveries. However, due to the privacy and security concerns, data owners are often disinclined to put sensitive data in a public cloud environment without enforcing some protective measures. An ideal solution is to develop secure genome database that supports encrypted data deposition and query. Methods: Nevertheless, it is a challenging task to make such a system fast and scalable enough to handle real-world demands providing data security as well. In this paper, we propose a novel, secure mechanism to support secure count queries on an open source graph database (Neo4j) and evaluated the performance on a real-world dataset of around 735,317 Single Nucleotide Polymorphisms (SNPs). In particular, we propose a new tree indexing method that offers constant time complexity (proportion to the tree depth), which was the bottleneck of existing approaches. Results: The proposed method significantly improves the runtime of query execution compared to the existing techniques. It takes less than one minute to execute an arbitrary count query on a dataset of 212 GB, while the best-known algorithm takes around 7 min. Conclusions: The outlined framework and experimental results show the applicability of utilizing graph database for securely storing large-scale genome data in untrusted environment. Furthermore, the crypto-system and security assumptions underlined are much suitable for such use cases which be generalized in future work. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:129 / 137
页数:9
相关论文
共 50 条
  • [41] Query routing in large-scale digital library systems
    Liu, L
    [J]. 15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 154 - 163
  • [43] Marbor: A Novel Large-Scale Graph Data Storage and Processing Framework
    Zhou, Wei
    Gao, Yun
    Han, Jizhong
    Xu, Zhiyong
    [J]. 2014 IEEE INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2014,
  • [44] Large-scale electrophysiology: Acquisition, compression, encryption, and storage of big data
    Brinkmann, Benjamin H.
    Bower, Mark R.
    Stengel, Keith A.
    Worrell, Gregory A.
    Stead, Matt
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2009, 180 (01) : 185 - 192
  • [45] MetHoS: a platform for large-scale processing, storage and analysis of metabolomics data
    Konstantinos Tzanakis
    Tim W. Nattkemper
    Karsten Niehaus
    Stefan P. Albaum
    [J]. BMC Bioinformatics, 23
  • [46] Efficient data reconstruction: The bottleneck of large-scale application of DNA storage
    Cao, Ben
    Zheng, Yanfen
    Shao, Qi
    Liu, Zhenlu
    Xie, Lei
    Zhao, Yunzhu
    Wang, Bin
    Zhang, Qiang
    Wei, Xiaopeng
    [J]. CELL REPORTS, 2024, 43 (04):
  • [47] MetHoS: a platform for large-scale processing, storage and analysis of metabolomics data
    Tzanakis, Konstantinos
    Nattkemper, Tim W.
    Niehaus, Karsten
    Albaum, Stefan P.
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [48] Impact of Data Placement on Resilience in Large-Scale Object Storage Systems
    Carns, Philip
    Harms, Kevin
    Jenkins, John
    Mubarak, Misbah
    Ross, Robert
    Carothers, Christopher
    [J]. 2016 32ND SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2016,
  • [49] A survey of large-scale analytical query processing in MapReduce
    Doulkeridis, Christos
    Norvag, Kjetil
    [J]. VLDB JOURNAL, 2014, 23 (03): : 355 - 380
  • [50] A survey of large-scale analytical query processing in MapReduce
    Christos Doulkeridis
    Kjetil Nørvåg
    [J]. The VLDB Journal, 2014, 23 : 355 - 380