VeChat: correcting errors in long reads using variation graphs

被引:10
|
作者
Luo, Xiao [1 ,2 ]
Kang, Xiongbin [1 ]
Schoenhuth, Alexander [1 ,2 ]
机构
[1] Bielefeld Univ, Fac Technol, Genome Data Sci, Bielefeld, Germany
[2] Ctr Wiskunde & Informat, Life Sci & Hlth, Amsterdam, Netherlands
关键词
GENOME; ACCURATE;
D O I
10.1038/s41467-022-34381-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] VeChat: correcting errors in long reads using variation graphs
    Xiao Luo
    Xiongbin Kang
    Alexander Schönhuth
    Nature Communications, 13
  • [2] Accurate self-correction of errors in long reads using de Bruijn graphs
    Salmela, Leena
    Walve, Riku
    Rivals, Eric
    Ukkonen, Esko
    BIOINFORMATICS, 2017, 33 (06) : 799 - 806
  • [3] Correcting errors in short reads by multiple alignments
    Salmela, Leena
    Schroeder, Jan
    BIOINFORMATICS, 2011, 27 (11) : 1455 - 1461
  • [4] CoLoRMap: Correcting Long Reads by Mapping short reads
    Haghshenas, Ehsan
    Hach, Faraz
    Sahinalp, S. Cenk
    Chauve, Cedric
    BIOINFORMATICS, 2016, 32 (17) : 545 - 551
  • [5] Tigmint: correcting assembly errors using linked reads from large molecules
    Shaun D. Jackman
    Lauren Coombe
    Justin Chu
    Rene L. Warren
    Benjamin P. Vandervalk
    Sarah Yeo
    Zhuyi Xue
    Hamid Mohamadi
    Joerg Bohlmann
    Steven J.M. Jones
    Inanc Birol
    BMC Bioinformatics, 19
  • [6] Tigmint: correcting assembly errors using linked reads from large molecules
    Jackman, Shaun D.
    Coombe, Lauren
    Chu, Justin
    Warren, Rene L.
    Vandervalk, Benjamin P.
    Yeo, Sarah
    Xue, Zhuyi
    Mohamadi, Hamid
    Bohlmann, Joerg
    Jones, Steven J. M.
    Birol, Inanc
    BMC BIOINFORMATICS, 2018, 19
  • [7] An Efficient Hybrid Approach to Correcting Errors in Short Reads
    Zhao, Zhiheng
    Yin, Jianping
    Li, Yong
    Xiong, Wei
    Zhan, Yubin
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2011, 2011, 6820 : 198 - 210
  • [8] Chaining for accurate alignment of erroneous long reads to acyclic variation graphs
    Ma, Jun
    Caceres, Manuel
    Salmela, Leena
    Makinen, Veli
    Tomescu, Alexandru, I
    BIOINFORMATICS, 2023, 39 (08)
  • [9] Metagenomics Binning of Long Reads Using Read-Overlap Graphs
    Wickramarachchi, Anuradha
    Lin, Yu
    COMPARATIVE GENOMICS (RECOMB-CG 2022), 2022, 13234 : 260 - 278
  • [10] Assembly of Long Error-Prone Reads Using Repeat Graphs
    Kolmogorov, Mikhail
    Yuan, Jeffrey
    Lin, Yu
    Pevzner, Pavel
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 261 - 262