AKSmooth: Enhancing low-coverage bisulfite sequencing data via kernel-based smoothing

被引:0
|
作者
Chen, Junfang [1 ,2 ]
Lutsik, Pavlo [2 ]
Akulenko, Ruslan [1 ]
Walter, Joern [2 ]
Helms, Volkhard [1 ]
机构
[1] Univ Saarland, Ctr Bioinformat, D-66123 Saarbrucken, Germany
[2] Univ Saarland, Dept Genet, D-66123 Saarbrucken, Germany
关键词
DNA methylation; whole-genome bisulfite sequencing; read coverage; DNA METHYLATION; CANCER EPIGENOMICS; 5-METHYLCYTOSINE; IDENTIFICATION; PIPELINE; CELLS;
D O I
10.1142/S0219720014420050
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Whole-genome bisulfite sequencing (WGBS) is an approach of growing importance. It is the only approach that provides a comprehensive picture of the genome-wide DNA methylation pro file. However, obtaining a sufficient amount of genome and read coverage typically requires high sequencing costs. Bioinformatics tools can reduce this cost burden by improving the quality of sequencing data. We have developed a statistical method Ajusted Local Kernel Smoother (AKSmooth) that can accurately and efficiently reconstruct the single CpG methylation estimate across the entire methylome using low-coverage bisulfite sequencing (Bi-Seq) data. We demonstrate the AKSmooth performance on the low-coverage (similar to 4x) DNA methylation profiles of three human colon cancer samples and matched controls. Under the best set of parameters, AKSmooth-curated data showed high concordance with the gold standard high-coverage sample (Pearson 0.90), outperforming the popular analogous method. In addition, AKSmooth showed computational efficiency with runtime benchmark over 4.5 times better than the reference tool. To summarize, AKSmooth is a simple and efficient tool that can provide an accurate human colon methylome estimation profile from low-coverage WGBS data. The proposed method is implemented in R and is available at https:// github.com/Junfang/AKSmooth.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
    Soylev, Arda
    Cokoglu, Sevim Seda
    Koptekin, Dilek
    Alkan, Can
    Somel, Mehmet
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [22] Next-Generation Sequencing Data Analysis on Pool-Seq and Low-Coverage Retinoblastoma Data
    Ozdemir Ozdogan, Gulistan
    Kaya, Hilal
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2020, 12 (03) : 302 - 310
  • [23] Learning controllers from data via kernel-based interpolation
    Hu, Zhongjie
    De Persis, Claudio
    Tesi, Pietro
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 8509 - 8514
  • [24] Metric Representations of Data via the Kernel-based Sammon Mapping
    Ma, Mingbo
    Gonet, Ryan
    Yu, RuiZhi
    Anagnostopoulos, Georgios C.
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [25] Next-Generation Sequencing Data Analysis on Pool-Seq and Low-Coverage Retinoblastoma Data
    Gülistan Özdemir Özdoğan
    Hilal Kaya
    Interdisciplinary Sciences: Computational Life Sciences, 2020, 12 : 302 - 310
  • [26] Detecting inherited and novel structural variants in low-coverage parent-child sequencing data
    Spence, Melissa
    Banuelos, Mario
    Marcia, Roummel F.
    Sindi, Suzanne
    METHODS, 2020, 173 : 61 - 68
  • [27] Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads
    Duitama, Jorge
    Kennedy, Justin
    Dinakar, Sanjiv
    Hernandez, Yoezen
    Wu, Yufeng
    Mandoiu, Ion I.
    BMC BIOINFORMATICS, 2011, 12
  • [28] PMAT: an efficient plant mitogenome assembly toolkit using low-coverage HiFi sequencing data
    Bi, Changwei
    Shen, Fei
    Han, Fuchuan
    Qu, Yanshu
    Hou, Jing
    Xu, Kewang
    Xu, Li-an
    He, Wenchuang
    Wu, Zhiqiang
    Yin, Tongming
    HORTICULTURE RESEARCH, 2024, 11 (03)
  • [29] Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes
    Rubinacci, Simone
    Hofmeister, Robin J.
    da Mota, Barbara Sousa
    Delaneau, Olivier
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 50 - 50
  • [30] Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
    Deng, Tianyu
    Zhang, Pengfei
    Garrick, Dorian
    Gao, Huijiang
    Wang, Lixian
    Zhao, Fuping
    FRONTIERS IN GENETICS, 2022, 12