Damming the genomic data flood using a comprehensive analysis and storage data structure

被引:3
|
作者
Bouffard, Marc [1 ]
Phillips, Michael S. [1 ,2 ,3 ]
Brown, Andrew M. K. [1 ,2 ,3 ]
Marsh, Sharon [1 ]
Tardif, Jean-Claude [1 ,2 ,3 ]
van Rooij, Tibor [1 ]
机构
[1] Univ Montreal, Beaulieu Saucier Univ Montreal Pharmacogen Ctr, Montreal, PQ, Canada
[2] Univ Montreal, Montreal Heart Inst, Montreal, PQ, Canada
[3] Univ Montreal, Fac Med, Montreal, PQ H3C 3J7, Canada
关键词
D O I
10.1093/database/baq029
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Comprehensive post-genomic data analysis approaches integrating biochemical pathway maps
    Lange, BM
    Ghassemian, M
    PHYTOCHEMISTRY, 2005, 66 (04) : 413 - 451
  • [42] Efficacy of a structured workflow for the interpretation of comprehensive genomic analysis data in clinical routine.
    Rieke, Damian Tobias
    Lamping, Mario
    Klauschen, Frederick
    Ochsenreither, Sebastian
    Schutte, Moritz
    Kessler, Thomas
    Klinghammer, Konrad Friedrich
    Joehrens, Korinna
    Messerschmidt, Clemens
    Lenze, Dido
    Burock, Susen
    Ditzen, Doreen
    Schaefer, Reinhold
    Pavel, Marianne
    Tinhofer, Inge
    Sers, Christine
    Beule, Dieter
    Yaspo, Marie-Laure
    Leyvraz, Serge
    Keilholz, Ulrich
    JOURNAL OF CLINICAL ONCOLOGY, 2018, 36 (15)
  • [43] Comprehensive Molecular Analysis of Oligodendroglial Tumors. Merging Genomic, Transcriptomic and Metabolomic Data
    Sevlever, Gustavo
    Ferrer-Luna, Ruben
    Nunez, Lina
    Calvar, Jorge
    Celda, Bernardo
    Martinetto, Horacio
    JOURNAL OF NEUROPATHOLOGY AND EXPERIMENTAL NEUROLOGY, 2010, 69 (05): : 522 - 522
  • [44] Haplotypes, epistatic interaction, and genomic pathways: Examples for comprehensive analysis of multivariate ordinal data
    Wittkowski, KM
    Pereira, M
    JOURNAL OF INVESTIGATIVE MEDICINE, 2003, 51 : S362 - S363
  • [45] Comprehensive molecular analysis of oligodendroglial tumors. merging genomic, transcriptomic and metabolomic data
    Martinetto, H.
    Ferrer-Luna, R.
    Nunez, L.
    Arias, E.
    Calvar, J.
    Taratuto, A.
    Celda, B.
    Sevlever, G.
    BRAIN PATHOLOGY, 2010, 20 : 63 - 63
  • [46] Genomic landscape of cutaneous, acral, mucosal, and uveal melanoma in Japan: analysis of clinical comprehensive genomic profiling data
    Hida, Tokimasa
    Kato, Junji
    Idogawa, Masashi
    Tokino, Takashi
    Uhara, Hisashi
    INTERNATIONAL JOURNAL OF CLINICAL ONCOLOGY, 2024, 29 (12) : 1984 - 1998
  • [47] FLOOD-PLAIN DELINEATION USING MULTISPECTRAL DATA-ANALYSIS
    HARKER, GR
    ROUSE, JW
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 1977, 43 (01): : 81 - &
  • [48] Empirical analysis of flood risk perception using historical data in Tokyo
    Sado-Inamura, Yukako
    Fukushi, Kensuke
    LAND USE POLICY, 2019, 82 : 13 - 29
  • [49] Uncertainty Analysis of Flood Disaster Assessment Using Remote Sensing Data
    Du, Cong
    Yan, Fuli
    Liu, Jing
    2006 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-8, 2006, : 1071 - 1073
  • [50] Improvements to Flood Frequency Analysis on Alluvial Rivers Using Paleoflood Data
    Reinders, Joeri B.
    Munoz, Samuel E.
    WATER RESOURCES RESEARCH, 2021, 57 (04)