Overlap-Based Genome Assembly from Variable-Length Reads

被引:0
|
作者
Hui, Joseph [1 ]
Shomorony, Ilan [1 ]
Ramchandran, Kannan [1 ]
Courtade, Thomas A. [1 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently developed high-throughput sequencing platforms can generate very long reads, making the perfect assembly of whole genomes information-theoretically possible [1]. One of the challenges in achieving this goal in practice, however, is that traditional assembly algorithms based on the de Bruijn graph framework cannot handle the high error rates of long-read technologies. On the other hand, overlap-based approaches such as string graphs [2] are very robust to errors, but cannot achieve the theoretical lower bounds. In particular, these methods handle the variable-length reads provided by long-read technologies in a suboptimal manner. In this work, we introduce a new assembly algorithm with two desirable features in the context of long-read sequencing: (1) it is an overlap-based method, thus being more resilient to read errors than de Bruijn graph approaches; and (2) it achieves the information-theoretic bounds even in the variable-length read setting.
引用
收藏
页码:1018 / 1022
页数:5
相关论文
共 50 条
  • [1] Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
    Matteo Comin
    Michele Schimd
    BMC Bioinformatics, 15
  • [2] Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns
    Comin, Matteo
    Schimd, Michele
    BMC BIOINFORMATICS, 2014, 15
  • [3] ARBitR: an overlap-aware genome assembly scaffolder for linked reads
    Hiltunen, Markus
    Ryberg, Martin
    Johannesson, Hanna
    BIOINFORMATICS, 2021, 37 (15) : 2203 - 2205
  • [4] VARIABLE-LENGTH CODE BASED ON AN ORDER COMPLEXITY
    Hong, Soongi
    Eom, Minyoung
    Choe, Yoonsik
    PCS: 2009 PICTURE CODING SYMPOSIUM, 2009, : 437 - 440
  • [5] BigDNA: Primer Design Software for Overlap-Based Assembly of Phage Genomes and Larger DNAs
    Vuong, Ivan
    Mageeney, Catherine M.
    Williams, Kelly P.
    PHAGE-THERAPY APPLICATIONS AND RESEARCH, 2022, 3 (04): : 213 - 220
  • [6] Statistical language modeling based on variable-length sequences
    Zitouni, I
    Smaïli, K
    Haton, JP
    COMPUTER SPEECH AND LANGUAGE, 2003, 17 (01): : 27 - 41
  • [7] EREC-Based Length Coding of Variable-Length Data Blocks
    Fang, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2010, 20 (10) : 1358 - 1366
  • [8] Forming block structures from variable-length codes
    Assanovich, B
    ISIT: 2002 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2002, : 375 - 375
  • [9] A highly efficient SDRAM controller supporting variable-length burst access and batch process for discrete reads
    Li, Nan
    Wang, Junzheng
    INTERNATIONAL JOURNAL OF ELECTRONICS, 2016, 103 (03) : 406 - 423
  • [10] A new algorithm for genome assembly from short reads
    Blazewicz, Jacek
    Bryja, Marcin
    Figlerowicz, Marek
    Gawron, Piotr
    Kasprzak, Marta
    Platt, Darren
    Przybytek, Jakub
    Swiercz, Aleksandra
    Szajkowski, Lukasz
    PROCEEDINGS OF THE 2008 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, 2008, : 455 - +