ReMILO: reference assisted misassembly detection algorithm using short and long reads

被引:7
|
作者
Bao, Ergude [1 ,2 ]
Song, Changjin [1 ]
Lan, Lingxiao [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Software Engn, Software Engn Res Ctr, Beijing 100044, Peoples R China
[2] Univ Calif Riverside, Dept Bot & Plant Sci, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
GENOME ASSEMBLIES; SINGLE-CELL; ALIGNMENT;
D O I
10.1093/bioinformatics/btx524
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies. Results: Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S. pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies. Availability and implementation: The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo. Contact: baoe@bjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:24 / 32
页数:9
相关论文
共 50 条
  • [41] Comparative genome analysis using sample-specific string detection in accurate long reads
    Khorsand, Parsoa
    Denti, Luca
    Bonizzoni, Paola
    Chikhi, Rayan
    Hormozdiari, Fereydoun
    [J]. BIOINFORMATICS ADVANCES, 2021, 1 (01):
  • [42] Vehicle Type Detection and Classification Using Enhanced ReliefF Algorithm and Long Short-Term Memory Network
    Sathyanarayana N.
    Narasimhamurthy A.M.
    [J]. Journal of The Institution of Engineers (India): Series B, 2023, 104 (02) : 485 - 499
  • [43] Development of a Sleep Apnea Detection Algorithm Using Long Short-Term Memory and Heart Rate Variability
    Iwasaki, Ayako
    Nakayama, Chikao
    Fujiwara, Koichi
    Sumi, Yukiyoshi
    Matsuo, Masahiro
    Kano, Manabu
    Kadotani, Hiroshi
    [J]. 2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 3964 - 3967
  • [44] A Novel Method for Fusion Gene Detection using Both End-Fragment Sequences of Long Reads
    Masuda, Keigo
    Sota, Yoshiaki
    Matsuda, Hideo
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON BIOMEDICAL AND BIOINFORMATICS ENGINEERING, ICBBE 2022, 2022, : 88 - 92
  • [45] An Approximate Bayesian Long Short-Term Memory Algorithm for Outlier Detection
    Chen, Chao
    Lin, Xiao
    Terejanu, Gabriel
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 201 - 206
  • [46] De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads
    Korlach, Jonas
    Gedman, Gregory
    Kingan, Sarah B.
    Chin, Chen-Shan
    Howard, Jason T.
    Audet, Jean-Nicolas
    Cantin, Lindsey
    Jarvis, Erich D.
    [J]. GIGASCIENCE, 2017, 6 (10):
  • [47] Mapping short reads to a genome without using hash look-up table algorithm and Burrows Wheeler Transformation
    Lin, Chun Yuan
    Huang, Ming-Yuan
    Chu, Chia-Han
    Tang, Petrus
    Tang, Chuan Yi
    [J]. BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 228 - +
  • [48] An accident diagnosis algorithm using long short-term memory
    Yang, Jaemin
    Kim, Jonghyun
    [J]. NUCLEAR ENGINEERING AND TECHNOLOGY, 2018, 50 (04) : 582 - 588
  • [49] Antenna Ports Detection Algorithm in LTE System Using the Repetition of the Reference Signal
    Tian, Zeng-Shan
    Wei, Shan
    Zhou, Mu
    [J]. COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2018, 423 : 485 - 494
  • [50] Detection of Long and Short DNA Using Nanopores with Graphitic Polyhedral Edges
    Freedman, Kevin J.
    Ahn, Chi Won
    Kim, Min Jun
    [J]. ACS NANO, 2013, 7 (06) : 5008 - 5016