ReMILO: reference assisted misassembly detection algorithm using short and long reads

被引:7
|
作者
Bao, Ergude [1 ,2 ]
Song, Changjin [1 ]
Lan, Lingxiao [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Software Engn, Software Engn Res Ctr, Beijing 100044, Peoples R China
[2] Univ Calif Riverside, Dept Bot & Plant Sci, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
GENOME ASSEMBLIES; SINGLE-CELL; ALIGNMENT;
D O I
10.1093/bioinformatics/btx524
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies. Results: Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S. pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies. Availability and implementation: The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo. Contact: baoe@bjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:24 / 32
页数:9
相关论文
共 50 条
  • [1] Misassembly detection using paired-end sequence reads and optical mapping data
    Muggli, Martin D.
    Puglisi, Simon J.
    Ronen, Roy
    Boucher, Christina
    [J]. BIOINFORMATICS, 2015, 31 (12) : 80 - 88
  • [2] HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads
    Antipov, Dmitry
    Korobeynikov, Anton
    McLean, Jeffrey S.
    Pevzner, Pavel A.
    [J]. BIOINFORMATICS, 2016, 32 (07) : 1009 - 1015
  • [3] Hybrid de novo tandem repeat detection using short and long reads
    Guillaume Fertin
    Géraldine Jean
    Andreea Radulescu
    Irena Rusu
    [J]. BMC Medical Genomics, 8
  • [4] Hybrid de novo tandem repeat detection using short and long reads
    Fertin, Guillaume
    Jean, Geraldine
    Radulescu, Andreea
    Rusu, Irena
    [J]. BMC MEDICAL GENOMICS, 2015, 8
  • [5] A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases
    Jain, Chirag
    Dilthey, Alexander
    Koren, Sergey
    Aluru, Srinivas
    Phillippy, Adam M.
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2017, 2017, 10229 : 66 - 81
  • [6] A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases
    Jain, Chirag
    Dilthey, Alexander
    Koren, Sergey
    Aluru, Srinivas
    Phillippy, Adam M.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2018, 25 (07) : 766 - 779
  • [7] Inversion Detection Using PacBio Long Reads
    Zhu, Shenglong
    Emrich, Scott J.
    Chen, Danny Z.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 237 - 242
  • [8] Inversion detection using PacBio long reads
    Zhu, Shenglong
    Emrich, Scott J.
    Chen, Danny Z.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2018, 20 (03) : 230 - 246
  • [9] Dysgu: efficient structural variant calling using short or long reads
    Cleal, Kez
    Baird, Duncan M.
    [J]. NUCLEIC ACIDS RESEARCH, 2022, 50 (09) : E53
  • [10] Improved transcriptome assembly using a hybrid of long and short reads with StringTie
    Shumate, Alaina
    Wong, Brandon
    Pertea, Geo
    Pertea, Mihaela
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (06)