Damming the genomic data flood using a comprehensive analysis and storage data structure

被引:3
|
作者
Bouffard, Marc [1 ]
Phillips, Michael S. [1 ,2 ,3 ]
Brown, Andrew M. K. [1 ,2 ,3 ]
Marsh, Sharon [1 ]
Tardif, Jean-Claude [1 ,2 ,3 ]
van Rooij, Tibor [1 ]
机构
[1] Univ Montreal, Beaulieu Saucier Univ Montreal Pharmacogen Ctr, Montreal, PQ, Canada
[2] Univ Montreal, Montreal Heart Inst, Montreal, PQ, Canada
[3] Univ Montreal, Fac Med, Montreal, PQ H3C 3J7, Canada
关键词
D O I
10.1093/database/baq029
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] An overview of genomic data analysis
    Huang, ES
    Nevins, JR
    West, M
    Kuo, PC
    SURGERY, 2004, 136 (03) : 497 - 499
  • [32] Blockchain Data Storage Optimisations: A Comprehensive Survey
    Heo, Jun Wook
    Ramachandran, Gowri Sankar
    Dorri, Ali
    Jurdak, Raja
    ACM COMPUTING SURVEYS, 2024, 56 (07) : 1 - 27
  • [33] Assessing the functional structure of genomic data
    Huttenhower, C.
    Troyanskaya, O. G.
    BIOINFORMATICS, 2008, 24 (13) : I330 - I338
  • [34] Comprehensive and realistic simulation of tumour genomic sequencing data
    O'Sullivan, Brian
    Seoighe, Cathal
    NAR CANCER, 2023, 5 (03):
  • [35] PROBABILITY ANALYSIS OF HISTORICAL FLOOD DATA
    GERARD, R
    KARPUK, EW
    JOURNAL OF THE HYDRAULICS DIVISION-ASCE, 1979, 105 (09): : 1153 - 1165
  • [36] Data Structure Consistency Using Atomic Operations in Storage Devices
    Devulapalli, Ananth
    Dalessandro, Dennis
    Wyckoff, Pete
    SNAPI 2008: FIFTH IEEE INTERNATIONAL WORKSHOP ON STORAGE NETWORK ARCHITECTURE AND PARALLEL I/OS, PROCEEDINGS, 2008, : 65 - 73
  • [37] STRUCTURE OF THE IRON STORAGE PROTEIN FERRITIN USING SYNCHROTRON DATA
    FORD, GC
    HARRISON, PM
    RICE, DW
    SMITH, JMA
    WHITE, JL
    ACTA CRYSTALLOGRAPHICA SECTION A, 1984, 40 : C38 - C39
  • [38] Analysis of genomic and proteomic data using advanced literature mining
    Hu, YH
    Hines, LM
    Weng, HF
    Zuo, DM
    Rivera, M
    Richardson, A
    LaBaer, J
    JOURNAL OF PROTEOME RESEARCH, 2003, 2 (04) : 405 - 412
  • [39] synbreed: a framework for the analysis of genomic prediction data using R
    Wimmer, Valentin
    Albrecht, Theresa
    Auinger, Hans-Juergen
    Schoen, Chris-Carolin
    BIOINFORMATICS, 2012, 28 (15) : 2086 - 2087
  • [40] QUANTIFYING POPULATION STRUCTURE OF MALARIA PARASITES USING EPIDEMIOLOGICAL AND GENOMIC DATA
    Chang, Hsiao-Han
    Wesolowski, Amy
    Sinha, Ipsita
    Jacob, Christopher
    Hossain, Amir
    Faiz, Abul
    Miotto, Olivo
    Kwiatkowski, Dominic
    Maude, Richard
    Buckee, Caroline
    AMERICAN JOURNAL OF TROPICAL MEDICINE AND HYGIENE, 2018, 99 (04): : 558 - 558