Damming the genomic data flood using a comprehensive analysis and storage data structure

被引:3
|
作者
Bouffard, Marc [1 ]
Phillips, Michael S. [1 ,2 ,3 ]
Brown, Andrew M. K. [1 ,2 ,3 ]
Marsh, Sharon [1 ]
Tardif, Jean-Claude [1 ,2 ,3 ]
van Rooij, Tibor [1 ]
机构
[1] Univ Montreal, Beaulieu Saucier Univ Montreal Pharmacogen Ctr, Montreal, PQ, Canada
[2] Univ Montreal, Montreal Heart Inst, Montreal, PQ, Canada
[3] Univ Montreal, Fac Med, Montreal, PQ H3C 3J7, Canada
关键词
D O I
10.1093/database/baq029
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Comprehensive molecular analysis of oligodendroglial tumours - merging genomic, transcriptomic and metabolomic data
    Martinetto, H.
    Ferrer-Luna, R.
    Nunez, L.
    Calvar, J.
    Arias, E.
    Cervio, A.
    Arakaki, N.
    Riudavets, M.
    Celda, B.
    Sevlever, G.
    EJC SUPPLEMENTS, 2010, 8 (05): : 203 - 204
  • [22] Identification of targeted therapy options for gastric adenocarcinoma by comprehensive analysis of genomic data
    Daniel A. Hescheler
    Patrick S. Plum
    Thomas Zander
    Alexander Quaas
    Michael Korenkov
    Asmae Gassa
    Maximilian Michel
    Christiane J. Bruns
    Hakan Alakus
    Gastric Cancer, 2020, 23 : 627 - 638
  • [23] Raw Genomic Data: Storage, Access, and Sharing
    Shabani, Mahsa
    Vears, Danya
    Borry, Pascal
    TRENDS IN GENETICS, 2018, 34 (01) : 8 - 10
  • [24] Comprehensive analysis of data aggregation techniques for flood vulnerability and bivariate flood risk mapping of a coastal urban floodplain
    Nandam, Vineela
    Patel, P. L.
    INTERNATIONAL JOURNAL OF DISASTER RISK REDUCTION, 2025, 119
  • [25] Identification of targeted therapy options for gastric adenocarcinoma by comprehensive analysis of genomic data
    Hescheler, Daniel A.
    Plum, Patrick S.
    Zander, Thomas
    Quaas, Alexander
    Korenkov, Michael
    Gassa, Asmae
    Michel, Maximilian
    Bruns, Christiane J.
    Alakus, Hakan
    GASTRIC CANCER, 2020, 23 (04) : 627 - 638
  • [26] Flood susceptibility analysis of settlement basins on a provincial scale using inventory flood data
    Kuscu, Imren
    Ozdemir, Hasan
    ENVIRONMENTAL EARTH SCIENCES, 2025, 84 (01)
  • [27] Big data for the comprehensive data analysis of IT organizations
    Madugula S.
    Pratapagiri S.
    Phridviraj M.S.B.
    Rao V.C.S.
    Polala N.
    Kumaraswamy P.
    Journal of High Technology Management Research, 2023, 34 (02):
  • [28] Using off-target data from comprehensive genomic profiling to characterize the genomic architecture of copy number alterations in tumor sequencing data
    Connelly, Caitlin F.
    Chalmers, Zachary R.
    Stephens, Philip J.
    Frampton, Garrett M.
    CANCER RESEARCH, 2017, 77
  • [29] SOME TECHNIQUES FOR USING FREQUENCY-ANALYSIS AND REALTIME DATA TO INTERPRET FLOOD POTENTIAL DATA
    POTYONDY, JP
    WATER RESOURCES BULLETIN, 1987, 23 (01): : 139 - 145
  • [30] Using data analysis to predict the students' trend of choosing preferred data storage
    Cholakov, Georgi Nikolov
    Stoyanova-Doycheva, Asya
    8TH INTERNATIONAL CONFERENCE ON HIGHER EDUCATION ADVANCES (HEAD '22), 2022, : 177 - 182