De novo diploid genome assembly using long noisy reads

被引:0
|
作者
Fan Nie
Peng Ni
Neng Huang
Jun Zhang
Zhenyu Wang
Chuanle Xiao
Feng Luo
Jianxin Wang
机构
[1] Central South University,School of Computer Science and Engineering
[2] Xiangjiang Laboratory,National Center for Applied Mathematics in Hunan and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education
[3] Xiangtan University,Hunan Provincial Key Lab on Bioinformatics
[4] Central South University,Institute of Nanfan & Seed Industry
[5] Guangdong Academy of Sciences,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center
[6] Sun Yat-sen University #7 Jinsui Road,School of Computing
[7] Tianhe District,undefined
[8] Clemson University,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.
引用
收藏
相关论文
共 50 条
  • [21] Individual Genome of the Russian Male: SNP Calling and a de novo Assembly of Unmapped Reads
    Chekanov, N. N.
    Boulygina, E. S.
    Beletskiy, A. V.
    Prokhortchouk, E. B.
    Skryabin, K. G.
    ACTA NATURAE, 2010, 2 (03): : 122 - 126
  • [22] EPGA: de novo assembly using the distributions of reads and insert size
    Luo, Junwei
    Wang, Jianxin
    Zhang, Zhen
    Wu, Fang-Xiang
    Li, Min
    Pan, Yi
    BIOINFORMATICS, 2015, 31 (06) : 825 - 833
  • [23] Linking De Novo Assembly Results with Long DNA Reads Using the dnaasm-link Application
    Kusmirek, Wiktor
    Franus, Wiktor
    Nowak, Robert
    BIOMED RESEARCH INTERNATIONAL, 2019, 2019
  • [24] HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads
    Al-okaily, Anas A.
    BMC GENOMICS, 2016, 17
  • [25] Assembly of the durian chloroplast genome using long PacBio reads
    Shearman, Jeremy R.
    Sonthirod, Chutima
    Naktang, Chaiwat
    Sangsrakru, Duangjai
    Yoocha, Thippawan
    Chatbanyong, Ratchanee
    Vorakuldumrongchai, Siriporn
    Chusri, Orwintinee
    Tangphatsornruang, Sithichoke
    Pootakham, Wirulda
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [26] SpLitteR: diploid genome assembly using TELL-Seq linked-reads and assembly graphs
    Tolstoganov, Ivan
    Chen, Zhoutao
    Pevzner, Pavel
    Korobeynikov, Anton
    PEERJ, 2024, 12
  • [27] Assembly of the durian chloroplast genome using long PacBio reads
    Jeremy R. Shearman
    Chutima Sonthirod
    Chaiwat Naktang
    Duangjai Sangsrakru
    Thippawan Yoocha
    Ratchanee Chatbanyong
    Siriporn Vorakuldumrongchai
    Orwintinee Chusri
    Sithichoke Tangphatsornruang
    Wirulda Pootakham
    Scientific Reports, 10
  • [28] HairSplitter: haplotype assembly from long, noisy reads
    Faure, Roland
    Lavenier, Dominique
    Flot, Jean-Francois
    PEER COMMUNITY JOURNAL, 2024, 4
  • [29] GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads
    Nauman Ahmed
    Tong Dong Qiu
    Koen Bertels
    Zaid Al-Ars
    BMC Bioinformatics, 21
  • [30] GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads
    Mohamed Awad
    Xiangchao Gan
    Nature Communications, 14