VeChat: correcting errors in long reads using variation graphs

被引:10
|
作者
Luo, Xiao [1 ,2 ]
Kang, Xiongbin [1 ]
Schoenhuth, Alexander [1 ,2 ]
机构
[1] Bielefeld Univ, Fac Technol, Genome Data Sci, Bielefeld, Germany
[2] Ctr Wiskunde & Informat, Life Sci & Hlth, Amsterdam, Netherlands
关键词
GENOME; ACCURATE;
D O I
10.1038/s41467-022-34381-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Pangenome graphs in infectious disease: a comprehensive genetic variation analysis of Neisseria meningitidis leveraging Oxford Nanopore long reads
    Yang, Zuyu
    Guarracino, Andrea
    Biggs, Patrick J.
    Black, Michael A.
    Ismail, Nuzla
    Wold, Jana Renee
    Merriman, Tony R.
    Prins, Pjotr
    Garrison, Erik
    de Ligt, Joep
    FRONTIERS IN GENETICS, 2023, 14
  • [22] ARAMIS: From systematic errors of NGS long reads to accurate assemblies
    Sacristan-Horcajada, E.
    Gonzalez-de la Fuente, S.
    Peiro-Pastor, R.
    Carrasco-Ramiro, F.
    Amils, R.
    Requena, J. M.
    Berenguer, J.
    Aguado, B.
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [23] LcDel: deletion variation detection based on clustering and long reads
    Yu, Yanan
    Gao, Runtian
    Luo, Junwei
    FRONTIERS IN GENETICS, 2024, 15
  • [24] A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
    Arghya Kusum Das
    Sayan Goswami
    Kisung Lee
    Seung-Jong Park
    BMC Genomics, 20
  • [25] A hybrid and scalable error correction algorithm for indel and substitution errors of long reads
    Das, Arghya Kusum
    Goswami, Sayan
    Lee, Kisung
    Park, Seung-Jong
    BMC GENOMICS, 2019, 20 (Suppl 11)
  • [26] Correcting ESL Errors Using Phrasal SMT Techniques
    Brockett, Chris
    Dolan, William B.
    Gamon, Michael
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 249 - 256
  • [27] Correcting Adjacent Errors Using Permutation Code Trees
    Heymann, R.
    Swart, T. G.
    Ferreira, H. C.
    IEEE AFRICON 2011, 2011,
  • [28] Blue: correcting sequencing errors using consensus and context
    Greenfield, Paul
    Duesing, Konsta
    Papanicolaou, Alexie
    Bauer, Denis C.
    BIOINFORMATICS, 2014, 30 (19) : 2723 - 2732
  • [29] Reconstructing viral haplotypes using long reads
    Cai, Dehan
    Sun, Yanni
    BIOINFORMATICS, 2022, 38 (08) : 2127 - 2134
  • [30] Inversion Detection Using PacBio Long Reads
    Zhu, Shenglong
    Emrich, Scott J.
    Chen, Danny Z.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 237 - 242