Secure large-scale genome data storage and query

被引:9
|
作者
Chen, Luyao [1 ,4 ]
Aziz, Md Momin [2 ,4 ]
Mohammed, Noman [2 ]
Jiang, Xiaoqian [3 ]
机构
[1] Carnegie Mellon Univ, Heinz Coll, Pittsburgh, PA 15213 USA
[2] Univ Manitoba, Comp Sci, Winnipeg, MB, Canada
[3] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Houston, TX 77030 USA
[4] Univ Calif San Diego, Dept Biomed Informat, La Jolla, CA 92093 USA
基金
美国国家卫生研究院; 加拿大自然科学与工程研究理事会;
关键词
Secure genome data storage; Graph database; Secure computation on genome data; Homomorphic encryption; Genome data storage Neo4j; SEQUENCING DATA;
D O I
10.1016/j.cmpb.2018.08.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and Objective: Cloud computing plays a vital role in big data science with its scalable and cost-efficient architecture. Large-scale genome data storage and computations would benefit from using these latest cloud computing infrastructures, to save cost and speedup discoveries. However, due to the privacy and security concerns, data owners are often disinclined to put sensitive data in a public cloud environment without enforcing some protective measures. An ideal solution is to develop secure genome database that supports encrypted data deposition and query. Methods: Nevertheless, it is a challenging task to make such a system fast and scalable enough to handle real-world demands providing data security as well. In this paper, we propose a novel, secure mechanism to support secure count queries on an open source graph database (Neo4j) and evaluated the performance on a real-world dataset of around 735,317 Single Nucleotide Polymorphisms (SNPs). In particular, we propose a new tree indexing method that offers constant time complexity (proportion to the tree depth), which was the bottleneck of existing approaches. Results: The proposed method significantly improves the runtime of query execution compared to the existing techniques. It takes less than one minute to execute an arbitrary count query on a dataset of 212 GB, while the best-known algorithm takes around 7 min. Conclusions: The outlined framework and experimental results show the applicability of utilizing graph database for securely storing large-scale genome data in untrusted environment. Furthermore, the crypto-system and security assumptions underlined are much suitable for such use cases which be generalized in future work. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:129 / 137
页数:9
相关论文
共 50 条
  • [1] Efficient and Secure Spatial Range Query over Large-scale Encrypted Data
    Miao, Yinbin
    Xu, Chao
    Zheng, Yifeng
    Liu, Ximeng
    Meng, Xiangdong
    Deng, Robert H.
    [J]. 2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 271 - 281
  • [2] Secure large-scale bingo
    Martínez- Ballesté, A
    Sebé, F
    Domingo-Ferrer, J
    [J]. ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, : 758 - 762
  • [3] A Tutorial on Secure Outsourcing of Large-scale Computations for Big Data
    Salinas, Sergio
    Chen, Xuhui
    Ji, Jinlong
    Li, Pan
    [J]. IEEE ACCESS, 2016, 4 : 1406 - 1416
  • [4] The Survey of Large-scale Query Classification
    Zhou, Sanduo
    Cheng, Kefei
    Men, Lijun
    [J]. 2017 5TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2017), 2017, 1834
  • [5] Research on Improved Method of Storage and Query of Large-Scale Remote Sensing Images
    Jing Weipeng
    Tian Dongxue
    Chen Guangsheng
    Li Yiyuan
    [J]. JOURNAL OF DATABASE MANAGEMENT, 2018, 29 (03) : 1 - 16
  • [6] Optimizing data robustness in large-scale storage systems
    Gougeaud, Sebastien
    Zertal, Soraya
    Lafoucriere, Jacques-Charles
    Deniel, Philippe
    [J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 236 - 243
  • [7] Textual Query of Personal Photos Facilitated by Large-Scale Web Data
    Liu, Yiming
    Xu, Dong
    Tsang, Ivor Wai-Hung
    Luo, Jiebo
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (05) : 1022 - 1036
  • [8] Blockchain-Assisted Secure Deduplication for Large-Scale Cloud Storage Service
    Hua, Zhongyun
    Yao, Yufei
    Song, Mingyang
    Zheng, Yifeng
    Zhang, Yushu
    Wang, Cong
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (03) : 821 - 835
  • [9] Practical and Secure Nearest Neighbor Search on Encrypted Large-Scale Data
    Wang, Boyang
    Hou, Yantian
    Li, Ming
    [J]. IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [10] RESEARCH BASED ON LARGE-SCALE DATA QUERY WITH MAPREDUCE TECHNOLOGY IN CLOUD COMPUTING
    Wang, Feiping
    Gu, Xiaofeng
    [J]. 2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 243 - 245