VeChat: correcting errors in long reads using variation graphs

被引：10

作者：

Luo, Xiao ^{[1
,2
]}

Kang, Xiongbin ^{[1
]}

Schoenhuth, Alexander ^{[1
,2
]}

机构：

[1] Bielefeld Univ, Fac Technol, Genome Data Sci, Bielefeld, Germany

[2] Ctr Wiskunde & Informat, Life Sci & Hlth, Amsterdam, Netherlands

来源：

NATURE COMMUNICATIONS | 2022年 / 13卷 / 01期

关键词：

GENOME; ACCURATE;

D O I：

10.1038/s41467-022-34381-8

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.

引用

页数：12

共 50 条

[41] Correcting Odometry Errors for Mobile Robots Using Image Processing
Korodi, Adrian
Dragomir, Toma L.
INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 1040 - 1045
[42] A new DSP architecture for correcting errors using Viterbi algorithm
Yoon, S
Kim, S
Oh, J
Kang, S
ADVANCED INTERNET SERVICES AND APPPLICATIONS, PROCEEDINGS, 2002, 2402 : 95 - 102
[43] Using graphs to avoid errors in the treatment of epilepsy
Runge, U
Rabending, G
EPILEPSIA, 1999, 40 : 92 - 92
[44] Accurate isoform discovery with IsoQuant using long reads
Prjibelski, Andrey D.
Mikheenko, Alla
Joglekar, Anoushka
Smetanin, Alexander
Jarroux, Julien
Lapidus, Alla L.
Tilgner, Hagen U.
NATURE BIOTECHNOLOGY, 2023, 41 (07) : 915 - +
[45] A benchmark of structural variation detection by long reads through a realistic simulated model
Nicolas Dierckxsens
Tong Li
Joris R. Vermeesch
Zhi Xie
Genome Biology, 22
[46] Countering adversarial perturbations in graphs using error correcting codes
Jabari, Saif Eddin
PHYSICAL REVIEW E, 2024, 110 (04)
[47] A benchmark of structural variation detection by long reads through a realistic simulated model
Dierckxsens, Nicolas
Li, Tong
Vermeesch, Joris R.
Xie, Zhi
GENOME BIOLOGY, 2021, 22 (01)
[48] Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar
Kato, Yoshihide
Matsubara, Shigeki
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2660 - 2663
[49] Machine Learning Approach for Correcting Preposition Errors using SVD Features
Aravind, Anuja
Anand, Kumar M.
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1731 - 1736
[50] Correcting errors in the data frame of turbo code using cyclic code
Li, Xiang-ming
Yue, Guang-xin
Beijing Youdian Xueyuan Xuebao/Journal of Beijing University of Posts And Telecommunications, 2000, 23 (04): : 84 - 87

← 1 2 3 4 5 →