A (fire)cloud-based DNA methylation data preprocessing and quality control platform

被引:5
|
作者
Kangeyan, Divy [1 ,2 ]
Dunford, Andrew [2 ]
Iyer, Sowmya [3 ]
Stewart, Chip [2 ]
Hanna, Megan [2 ]
Getz, Gad [2 ,3 ,4 ,5 ]
Aryee, Martin J. [1 ,2 ,3 ,4 ,5 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[3] Massachusetts Gen Hosp, Dept Pathol, Boston, MA 02114 USA
[4] Harvard Med Sch, Dept Pathol, Boston, MA 02115 USA
[5] Massachusetts Gen Hosp, Canc Ctr, Boston, MA 02114 USA
关键词
DNA methylation; Cloud computing; Bioinformatics workflows; Quality control analysis;
D O I
10.1186/s12859-019-2750-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bisulfite sequencing allows base-pair resolution profiling of DNA methylation and has recently been adapted for use in single-cells. Analyzing these data, including making comparisons with existing data, remains challenging due to the scale of the data and differences in preprocessing methods between published datasets. Results: We present a set of preprocessing pipelines for bisulfite sequencing DNA methylation data that include a new R/Bioconductor package, scmeth, for a series of efficient QC analyses of large datasets. The pipelines go from raw data to CpG-level methylation estimates and can be run, with identical results, either on a single computer, in an HPC cluster or on Google Cloud Compute resources. These pipelines are designed to allow users to 1) ensure reproducibility of analyses, 2) achieve scalability to large whole genome datasets with 100 GB+ of raw data per sample and to single-cell datasets with thousands of cells, 3) enable integration and comparison between user-provided data and publicly available data, as all samples can be processed through the same pipeline, and 4) access to best-practice analysis pipelines. Pipelines are provided for whole genome bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS) and hybrid selection (capture) bisulfite sequencing (HSBS). Conclusions: The workflows produce data quality metrics, visualization tracks, and aggregated output for further downstream analysis. Optional use of cloud computing resources facilitates analysis of large datasets, and integration with existing methylome profiles. The workflow design principles are applicable to other genomic data types.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A (fire)cloud-based DNA methylation data preprocessing and quality control platform
    Divy Kangeyan
    Andrew Dunford
    Sowmya Iyer
    Chip Stewart
    Megan Hanna
    Gad Getz
    Martin J. Aryee
    BMC Bioinformatics, 20
  • [2] A Cloud-based IoT Data Gathering and Processing Platform
    Emeakaroha, Vincent C.
    Cafferkey, Neil
    Healy, Philip
    Morrison, John P.
    2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 50 - 57
  • [3] A cloud-based platform for encrypted data mining as a service
    Reyes-Palacios, Shanel
    Morales-Sandoval, Miguel
    Garcia-Hernandez, Jose Juan
    Marin-Castro, Heidy M.
    Gonzalez-Compean, J. L.
    2023 MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE, ENC, 2024,
  • [4] Demo: Cloud-Based Vehicular Data Analytics Platform
    Muramudalige, Shashika Ranga
    Bandara, H. M. N. Dilum
    MOBISYS'16: COMPANION COMPANION PUBLICATION OF THE 14TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS, AND SERVICES, 2016, : 1 - 1
  • [5] The Modern Cloud-Based Platform
    Tilkov, Stefan
    IEEE SOFTWARE, 2015, 32 (02) : 112 - 115
  • [6] Toward feature selection in big data preprocessing based on hybrid cloud-based model
    Noha Shehab
    Mahmoud Badawy
    H Arafat Ali
    The Journal of Supercomputing, 2022, 78 : 3226 - 3265
  • [7] Toward feature selection in big data preprocessing based on hybrid cloud-based model
    Shehab, Noha
    Badawy, Mahmoud
    Ali, H. Arafat
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (03): : 3226 - 3265
  • [8] Scalable Cloud-Based Data Storage Platform for Smart Grid
    Shwe, Hnin Yu
    Hee, Soong Boon
    Chong, Peter Han Joo
    SMART GRID INSPIRED FUTURE TECHNOLOGIES, 2017, 203 : 259 - 265
  • [9] A Cloud-based Data Farming Platform for Molecular Dynamics Simulations
    Krol, Dariusz
    Orzechowski, Michal
    Kitowski, Jacek
    Niethammer, Christoph
    Sulistio, Anthony
    Wafai, Amer
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 579 - 584
  • [10] Automated Quality Control Monitoring of Diagnostic Imaging Equipment Using a Cloud-Based Compliance Platform
    Mattison, B.
    Manning, D.
    Emery, K.
    Jordan, D.
    MEDICAL PHYSICS, 2017, 44 (06) : 3221 - 3221