VeChat: correcting errors in long reads using variation graphs

被引：10

作者：

Luo, Xiao ^{[1
,2
]}

Kang, Xiongbin ^{[1
]}

Schoenhuth, Alexander ^{[1
,2
]}

机构：

[1] Bielefeld Univ, Fac Technol, Genome Data Sci, Bielefeld, Germany

[2] Ctr Wiskunde & Informat, Life Sci & Hlth, Amsterdam, Netherlands

来源：

NATURE COMMUNICATIONS | 2022年 / 13卷 / 01期

关键词：

GENOME; ACCURATE;

D O I：

10.1038/s41467-022-34381-8

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.

引用

页数：12

共 50 条

[1] VeChat: correcting errors in long reads using variation graphs
Xiao Luo
Xiongbin Kang
Alexander Schönhuth
Nature Communications, 13
[2] Accurate self-correction of errors in long reads using de Bruijn graphs
Salmela, Leena
Walve, Riku
Rivals, Eric
Ukkonen, Esko
BIOINFORMATICS, 2017, 33 (06) : 799 - 806
[3] Correcting errors in short reads by multiple alignments
Salmela, Leena
Schroeder, Jan
BIOINFORMATICS, 2011, 27 (11) : 1455 - 1461
[4] CoLoRMap: Correcting Long Reads by Mapping short reads
Haghshenas, Ehsan
Hach, Faraz
Sahinalp, S. Cenk
Chauve, Cedric
BIOINFORMATICS, 2016, 32 (17) : 545 - 551
[5] Tigmint: correcting assembly errors using linked reads from large molecules
Shaun D. Jackman
Lauren Coombe
Justin Chu
Rene L. Warren
Benjamin P. Vandervalk
Sarah Yeo
Zhuyi Xue
Hamid Mohamadi
Joerg Bohlmann
Steven J.M. Jones
Inanc Birol
BMC Bioinformatics, 19
[6] Tigmint: correcting assembly errors using linked reads from large molecules
Jackman, Shaun D.
Coombe, Lauren
Chu, Justin
Warren, Rene L.
Vandervalk, Benjamin P.
Yeo, Sarah
Xue, Zhuyi
Mohamadi, Hamid
Bohlmann, Joerg
Jones, Steven J. M.
Birol, Inanc
BMC BIOINFORMATICS, 2018, 19
[7] An Efficient Hybrid Approach to Correcting Errors in Short Reads
Zhao, Zhiheng
Yin, Jianping
Li, Yong
Xiong, Wei
Zhan, Yubin
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2011, 2011, 6820 : 198 - 210
[8] Chaining for accurate alignment of erroneous long reads to acyclic variation graphs
Ma, Jun
Caceres, Manuel
Salmela, Leena
Makinen, Veli
Tomescu, Alexandru, I
BIOINFORMATICS, 2023, 39 (08)
[9] Metagenomics Binning of Long Reads Using Read-Overlap Graphs
Wickramarachchi, Anuradha
Lin, Yu
COMPARATIVE GENOMICS (RECOMB-CG 2022), 2022, 13234 : 260 - 278
[10] Assembly of Long Error-Prone Reads Using Repeat Graphs
Kolmogorov, Mikhail
Yuan, Jeffrey
Lin, Yu
Pevzner, Pavel
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 261 - 262

← 1 2 3 4 5 →