VeChat: correcting errors in long reads using variation graphs

被引：10

作者：

Luo, Xiao ^{[1
,2
]}

Kang, Xiongbin ^{[1
]}

Schoenhuth, Alexander ^{[1
,2
]}

机构：

[1] Bielefeld Univ, Fac Technol, Genome Data Sci, Bielefeld, Germany

[2] Ctr Wiskunde & Informat, Life Sci & Hlth, Amsterdam, Netherlands

来源：

NATURE COMMUNICATIONS | 2022年 / 13卷 / 01期

关键词：

GENOME; ACCURATE;

D O I：

10.1038/s41467-022-34381-8

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.

引用

页数：12

共 50 条

[21] Pangenome graphs in infectious disease: a comprehensive genetic variation analysis of Neisseria meningitidis leveraging Oxford Nanopore long reads
Yang, Zuyu
Guarracino, Andrea
Biggs, Patrick J.
Black, Michael A.
Ismail, Nuzla
Wold, Jana Renee
Merriman, Tony R.
Prins, Pjotr
Garrison, Erik
de Ligt, Joep
FRONTIERS IN GENETICS, 2023, 14
[22] ARAMIS: From systematic errors of NGS long reads to accurate assemblies
Sacristan-Horcajada, E.
Gonzalez-de la Fuente, S.
Peiro-Pastor, R.
Carrasco-Ramiro, F.
Amils, R.
Requena, J. M.
Berenguer, J.
Aguado, B.
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
[23] LcDel: deletion variation detection based on clustering and long reads
Yu, Yanan
Gao, Runtian
Luo, Junwei
FRONTIERS IN GENETICS, 2024, 15
[24] A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
Arghya Kusum Das
Sayan Goswami
Kisung Lee
Seung-Jong Park
BMC Genomics, 20
[25] A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
Das, Arghya Kusum
Goswami, Sayan
Lee, Kisung
Park, Seung-Jong
BMC GENOMICS, 2019, 20 (Suppl 11)
[26] Correcting ESL Errors Using Phrasal SMT Techniques
Brockett, Chris
Dolan, William B.
Gamon, Michael
COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 249 - 256
[27] Correcting Adjacent Errors Using Permutation Code Trees
Heymann, R.
Swart, T. G.
Ferreira, H. C.
IEEE AFRICON 2011, 2011,
[28] Blue: correcting sequencing errors using consensus and context
Greenfield, Paul
Duesing, Konsta
Papanicolaou, Alexie
Bauer, Denis C.
BIOINFORMATICS, 2014, 30 (19) : 2723 - 2732
[29] Reconstructing viral haplotypes using long reads
Cai, Dehan
Sun, Yanni
BIOINFORMATICS, 2022, 38 (08) : 2127 - 2134
[30] Inversion Detection Using PacBio Long Reads
Zhu, Shenglong
Emrich, Scott J.
Chen, Danny Z.
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 237 - 242

← 1 2 3 4 5 →