A (fire)cloud-based DNA methylation data preprocessing and quality control platform

被引:5
|
作者
Kangeyan, Divy [1 ,2 ]
Dunford, Andrew [2 ]
Iyer, Sowmya [3 ]
Stewart, Chip [2 ]
Hanna, Megan [2 ]
Getz, Gad [2 ,3 ,4 ,5 ]
Aryee, Martin J. [1 ,2 ,3 ,4 ,5 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[3] Massachusetts Gen Hosp, Dept Pathol, Boston, MA 02114 USA
[4] Harvard Med Sch, Dept Pathol, Boston, MA 02115 USA
[5] Massachusetts Gen Hosp, Canc Ctr, Boston, MA 02114 USA
关键词
DNA methylation; Cloud computing; Bioinformatics workflows; Quality control analysis;
D O I
10.1186/s12859-019-2750-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bisulfite sequencing allows base-pair resolution profiling of DNA methylation and has recently been adapted for use in single-cells. Analyzing these data, including making comparisons with existing data, remains challenging due to the scale of the data and differences in preprocessing methods between published datasets. Results: We present a set of preprocessing pipelines for bisulfite sequencing DNA methylation data that include a new R/Bioconductor package, scmeth, for a series of efficient QC analyses of large datasets. The pipelines go from raw data to CpG-level methylation estimates and can be run, with identical results, either on a single computer, in an HPC cluster or on Google Cloud Compute resources. These pipelines are designed to allow users to 1) ensure reproducibility of analyses, 2) achieve scalability to large whole genome datasets with 100 GB+ of raw data per sample and to single-cell datasets with thousands of cells, 3) enable integration and comparison between user-provided data and publicly available data, as all samples can be processed through the same pipeline, and 4) access to best-practice analysis pipelines. Pipelines are provided for whole genome bisulfite sequencing (WGBS), reduced representation bisulfite sequencing (RRBS) and hybrid selection (capture) bisulfite sequencing (HSBS). Conclusions: The workflows produce data quality metrics, visualization tracks, and aggregated output for further downstream analysis. Optional use of cloud computing resources facilitates analysis of large datasets, and integration with existing methylome profiles. The workflow design principles are applicable to other genomic data types.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] A Cloud-Based IoT Platform for Precision Control of Soilless Greenhouse Cultivation
    Sagheer, Alaa
    Mohammed, Maged
    Riad, Khaled
    Alhajhoj, Mohammed
    SENSORS, 2021, 21 (01) : 1 - 29
  • [22] Cloud-based Remote Virtual Prototyping Platform for Embedded Control Applications
    Werner, Stephan
    Lauber, Andreas
    Becker, Juergen
    Sax, Eric
    PROCEEDINGS OF 2016 13TH INTERNATIONAL CONFERENCE ON REMOTE ENGINEERING AND VIRTUAL INSTRUMENTATION (REV), 2016, : 168 - 175
  • [23] Data Preprocessing Method and API for Mining Processes from Cloud-Based Application Event Logs
    El-Gharib, Najah Mary
    Amyot, Daniel
    ALGORITHMS, 2022, 15 (06)
  • [24] Efficient Cloud-Based Calibration of Input Data for Forest Fire Spread Prediction
    Fraga, Edigley
    Cortes, Ana
    Margalef, Tomas
    Hernandez, Porfidio
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022), 2022, : 128 - 136
  • [25] Navigating Cloud-Based Integrations: Challenges and Decision Factors in Cloud-Based Integration Platform Selection
    Hyrynsalmi, Sonja M.
    Koskinen, Kari M.
    Rossi, Matti
    Smolander, Kari
    IEEE ACCESS, 2024, 12 : 113826 - 113841
  • [26] Privacy Aware Access Control for Cloud-Based Data Platforms
    McCarthy, Donal
    Malone, Paul
    Hange, Johannes
    Doyle, Kenny
    Robson, Eric
    Conway, Dylan
    Ivanov, Stepan
    Radziwonowicz, Lukasz
    Kleinfeld, Robert
    Michalareas, Theodoros
    Kastrinogiannis, Timotheos
    Stasinos, Nikos
    Lampathaki, Fenareti
    CYBER SECURITY AND PRIVACY, CSP INNOVATION FORUM 2015, 2015, 530 : 26 - 37
  • [27] Cloud-Based Computational Data-Enabled Predictive Control
    Dai, Li
    Huang, Teng
    Gao, Runze
    Zhang, Yuan
    Xia, Yuanqing
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (24): : 24949 - 24962
  • [28] Dual Access Control for Cloud-Based Data Storage and Sharing
    Ning, Jianting
    Huang, Xinyi
    Susilo, Willy
    Liang, Kaitai
    Liu, Ximeng
    Zhang, Yinghui
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (02) : 1036 - 1048
  • [29] A Cloud-Based Platform for Supporting Research Collaboration
    McGregor, A.
    Bennett, D.
    Majumdar, S.
    Nandy, B.
    Melendez, J. O.
    St-Hilaire, M.
    Lau, D.
    Liu, J.
    2015 IEEE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, 2015, : 1107 - 1110
  • [30] Cloud-based Orthognathic Surgical Planning Platform
    Swinkels, Wout
    Sun, Yi
    Stukken, Bart
    Politis, Constantinus
    Claesen, Luc
    2015 IEEE 13TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2015,