VeChat: correcting errors in long reads using variation graphs

被引:10
|
作者
Luo, Xiao [1 ,2 ]
Kang, Xiongbin [1 ]
Schoenhuth, Alexander [1 ,2 ]
机构
[1] Bielefeld Univ, Fac Technol, Genome Data Sci, Bielefeld, Germany
[2] Ctr Wiskunde & Informat, Life Sci & Hlth, Amsterdam, Netherlands
关键词
GENOME; ACCURATE;
D O I
10.1038/s41467-022-34381-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Correcting Odometry Errors for Mobile Robots Using Image Processing
    Korodi, Adrian
    Dragomir, Toma L.
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 1040 - 1045
  • [42] A new DSP architecture for correcting errors using Viterbi algorithm
    Yoon, S
    Kim, S
    Oh, J
    Kang, S
    ADVANCED INTERNET SERVICES AND APPPLICATIONS, PROCEEDINGS, 2002, 2402 : 95 - 102
  • [43] Using graphs to avoid errors in the treatment of epilepsy
    Runge, U
    Rabending, G
    EPILEPSIA, 1999, 40 : 92 - 92
  • [44] Accurate isoform discovery with IsoQuant using long reads
    Prjibelski, Andrey D.
    Mikheenko, Alla
    Joglekar, Anoushka
    Smetanin, Alexander
    Jarroux, Julien
    Lapidus, Alla L.
    Tilgner, Hagen U.
    NATURE BIOTECHNOLOGY, 2023, 41 (07) : 915 - +
  • [45] A benchmark of structural variation detection by long reads through a realistic simulated model
    Nicolas Dierckxsens
    Tong Li
    Joris R. Vermeesch
    Zhi Xie
    Genome Biology, 22
  • [47] A benchmark of structural variation detection by long reads through a realistic simulated model
    Dierckxsens, Nicolas
    Li, Tong
    Vermeesch, Joris R.
    Xie, Zhi
    GENOME BIOLOGY, 2021, 22 (01)
  • [48] Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar
    Kato, Yoshihide
    Matsubara, Shigeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2660 - 2663
  • [49] Machine Learning Approach for Correcting Preposition Errors using SVD Features
    Aravind, Anuja
    Anand, Kumar M.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1731 - 1736
  • [50] Correcting errors in the data frame of turbo code using cyclic code
    Li, Xiang-ming
    Yue, Guang-xin
    Beijing Youdian Xueyuan Xuebao/Journal of Beijing University of Posts And Telecommunications, 2000, 23 (04): : 84 - 87