Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly

被引:32
|
作者
Holley, Guillaume [1 ]
Beyter, Doruk [1 ]
Ingimundardottir, Helga [1 ]
Moller, Peter L. [2 ]
Kristmundsdottir, Snodis [1 ,3 ]
Eggertsson, Hannes P. [1 ]
Halldorsson, Bjarni, V [1 ,3 ]
机构
[1] Amgen Inc, deCODE Genet, Reykjavik, Iceland
[2] Aarhus Univ, Dept Biomed, Aarhus, Denmark
[3] Reykjavik Univ, Sch Technol, Reykjavik, Iceland
关键词
GENOME; LIBRARY;
D O I
10.1186/s13059-020-02244-4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] An Error Correction and DeNovo Assembly Approach for Nanopore Reads Using Short Reads
    Kchouk, Mehdi
    Elloumi, Mourad
    CURRENT BIOINFORMATICS, 2018, 13 (03) : 241 - 252
  • [22] Dysgu: efficient structural variant calling using short or long reads
    Cleal, Kez
    Baird, Duncan M.
    NUCLEIC ACIDS RESEARCH, 2022, 50 (09) : E53
  • [23] Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
    Kishwar Shafin
    Trevor Pesout
    Pi-Chuan Chang
    Maria Nattestad
    Alexey Kolesnikov
    Sidharth Goel
    Gunjan Baid
    Mikhail Kolmogorov
    Jordan M. Eizenga
    Karen H. Miga
    Paolo Carnevali
    Miten Jain
    Andrew Carroll
    Benedict Paten
    Nature Methods, 2021, 18 : 1322 - 1332
  • [24] Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
    Shafin, Kishwar
    Pesout, Trevor
    Chang, Pi-Chuan
    Nattestad, Maria
    Kolesnikov, Alexey
    Goel, Sidharth
    Baid, Gunjan
    Kolmogorov, Mikhail
    Eizenga, Jordan M.
    Miga, Karen H.
    Carnevali, Paolo
    Jain, Miten
    Carroll, Andrew
    Paten, Benedict
    NATURE METHODS, 2021, 18 (11) : 1322 - +
  • [25] Hercules: a profile HMM-based hybrid error correction algorithm for long reads
    Firtina, Can
    Bar-Joseph, Ziv
    Alkan, Can
    Cicek, A. Ercument
    NUCLEIC ACIDS RESEARCH, 2018, 46 (21) : e125
  • [26] Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches
    Miclotte, Giles
    Heydari, Mahdi
    Demeester, Piet
    Audenaert, Pieter
    Fostier, Jan
    ALGORITHMS IN BIOINFORMATICS (WABI 2015), 2015, 9289 : 175 - 188
  • [27] HASLR: Fast Hybrid Assembly of Long Reads
    Haghshenas, Ehsan
    Asghari, Hossein
    Stoye, Jens
    Chauve, Cedric
    Hach, Faraz
    ISCIENCE, 2020, 23 (08)
  • [28] Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing
    Peter Edge
    Vikas Bansal
    Nature Communications, 10
  • [29] Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing
    Edge, Peter
    Bansal, Vikas
    NATURE COMMUNICATIONS, 2019, 10 (1)
  • [30] Error Correction in Nanopore Reads for de novo Genomic Assembly
    Aldridge-Aguila, Jacqueline
    Alvarez-Saravia, Diego
    Navarrete, Marcelo
    Uribe-Paredes, Roberto
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 754 - 762