Using blockchain to log genome dataset access: efficient storage and query

被引:11
|
作者
Gursoy, Gamze [1 ,2 ]
Bjornson, Robert [3 ,4 ]
Green, Molly E. [1 ,2 ]
Gerstein, Mark [1 ,2 ,4 ]
机构
[1] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[2] Yale Univ, Dept Mol Biochem & Biophys, New Haven, CT 06520 USA
[3] Yale Ctr Res Comp, New Haven, CT 06520 USA
[4] Yale Univ, Dept Comp Sci, POB 2158, New Haven, CT 06520 USA
基金
美国国家卫生研究院;
关键词
Blockchain; Secure storage; Genomic data access log; GENOTYPES; DATABASE;
D O I
10.1186/s12920-020-0716-z
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Genomic variants are considered sensitive information, revealing potentially private facts about individuals. Therefore, it is important to control access to such data. A key aspect of controlled access is secure storage and efficient query of access logs, for potential misuse. However, there are challenges to securing logs, such as designing against the consequences of "single points of failure". A potential approach to circumvent these challenges is blockchain technology, which is currently popular in cryptocurrency due to its properties of security, immutability, and decentralization. One of the tasks of the iDASH (Integrating Data for Analysis, Anonymization, and Sharing) Secure Genome Analysis Competition in 2018 was to develop time- and space-efficient blockchain-based ledgering solutions to log and query user activity accessing genomic datasets across multiple sites, using MultiChain. Methods MultiChain is a specific blockchain platform that offers "data streams" embedded in the chain for rapid and secure data storage. We devised a storage protocol taking advantage of the keys in the MultiChain data streams and created a data frame from the chain allowing efficient query. Our solution to the iDASH competition was selected as the winner at a workshop held in San Diego, CA in October 2018. Although our solution worked well in the challenge, it has the drawback that it requires downloading all the data from the chain and keeping it locally in memory for fast query. To address this, we provide an alternate "bigmem" solution that uses indices rather than local storage for rapid queries. Results We profiled the performance of both of our solutions using logs with 100,000 to 600,000 entries, both for querying the chain and inserting data into it. The challenge solution requires 12 seconds time and 120 Mb of memory for querying from 100,000 entries. The memory requirement increases linearly and reaches 470 MB for a chain with 600,000 entries. Although our alternate bigmem solution is slower and requires more memory (408 seconds and 250 MB, respectively, for 100,000 entries), the memory requirement increases at a slower rate and reaches only 360 MB for 600,000 entries. Conclusion Overall, we demonstrate that genomic access log files can be stored and queried efficiently with blockchain. Beyond this, our protocol potentially could be applied to other types of health data such as electronic health records.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Using blockchain to log genome dataset access: efficient storage and query
    Gamze Gürsoy
    Robert Bjornson
    Molly E. Green
    Mark Gerstein
    BMC Medical Genomics, 13
  • [2] A blockchain-based log storage model with efficient query
    Xu, Gang
    Yun, Fan
    Xu, Shiyuan
    Yu, Yiying
    Chen, Xiu-Bo
    Dong, Mianxiong
    SOFT COMPUTING, 2023, 27 (19) : 13779 - 13787
  • [3] A secure and efficient log storage and query framework based on blockchain
    Li, Wenxian
    Feng, Yong
    Liu, Nianbo
    Li, Yingna
    Fu, Xiaodong
    Yu, YongTao
    COMPUTER NETWORKS, 2024, 252
  • [4] A blockchain-based log storage model with efficient query
    Gang Xu
    Fan Yun
    Shiyuan Xu
    Yiying Yu
    Xiu-Bo Chen
    Mianxiong Dong
    Soft Computing, 2023, 27 : 13779 - 13787
  • [5] A Query Log Analysis of Dataset Search
    Kacprzak, Emilia
    Koesten, Laura M.
    Ibanez, Luis-Daniel
    Simperl, Elena
    Tennison, Jeni
    WEB ENGINEERING (ICWE 2017), 2017, 10360 : 429 - 436
  • [6] Efficient Query Model for Storage Capacity Scalable Blockchain System
    Jia D.-Y.
    Xin J.-C.
    Wang Z.-Q.
    Guo W.
    Wang G.-R.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (09): : 2655 - 2670
  • [7] Query expansion using web access log files
    Zhu, Y
    Gruenwald, L
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 686 - 695
  • [8] Secure Log Storage Using Blockchain and Cloud Infrastructure
    Kumar, Manish
    Singh, Ashish Kumar
    Kumar, T. V. Suresh
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [9] LETUS: A Log-Structured Efficient Trusted Universal BlockChain Storage
    Tian, Shikun
    Lu, Zhonghao
    Zhuo, Haizhen
    Tang, Xiaojing
    Hong, Peiyi
    Chen, Shenglong
    Yang, Dayi
    Yan, Ying
    Jiang, Zhiyong
    Zhang, Hui
    Jiang, Guofei
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 161 - 174
  • [10] Os-elm based storage strategy for efficient query in blockchain database
    Jia, Dayu
    Yang, Guanghong
    Huang, Min
    Xin, Junchang
    Wang, Guoren
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2835 - 2847