Optimization of de novo transcriptome assembly from next-generation sequencing data

被引:266
|
作者
Surget-Groba, Yann [1 ]
Montoya-Burgos, Juan I. [1 ]
机构
[1] Univ Geneva, Dept Zool & Anim Biol, CH-1211 Geneva 4, Switzerland
关键词
RNA-SEQ; MODEL ORGANISMS; GENE DISCOVERY; LARGE SETS; GENOME; RESOLUTION; ALLPATHS; PROGRAM; PROTEIN; FISHES;
D O I
10.1101/gr.103846.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two methods to assemble the transcriptome of the non-model catfish Loricaria gr. cataphracta. Using the Multiple-k and STM methods, the assembly increases in contiguity and in gene identification, showing that our methods clearly improve quality and can be widely used. The new methods were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de novo assembly failed.
引用
收藏
页码:1432 / 1440
页数:9
相关论文
共 50 条
  • [1] NEXT-GENERATION DNA SEQUENCING FOR DE NOVO GENOME ASSEMBLY
    Hiatt, J.
    Turner, E.
    Patwardhan, R.
    Lee, C.
    Shendure, J.
    [J]. JOURNAL OF INVESTIGATIVE MEDICINE, 2009, 57 (01) : 114 - 114
  • [2] De Novo Assembly Methods for Next Generation Sequencing Data
    He, Yiming
    Zhang, Zhen
    Peng, Xiaoqing
    Wu, Fangxiang
    Wang, Jianxin
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (05) : 500 - 514
  • [3] De Novo Assembly Methods for Next Generation Sequencing Data
    Yiming He
    Zhen Zhang
    Xiaoqing Peng
    Fangxiang Wu
    Jianxin Wang
    [J]. Tsinghua Science and Technology, 2013, 18 (05) : 500 - 514
  • [4] Comparative studies of de novo assembly tools for next-generation sequencing technologies
    Lin, Yong
    Li, Jian
    Shen, Hui
    Zhang, Lei
    Papasian, Christopher J.
    Deng, Hong-Wen
    [J]. BIOINFORMATICS, 2011, 27 (15) : 2031 - 2037
  • [5] An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data
    Deng, Xutao
    Naccache, Samia N.
    Ng, Terry
    Federman, Scot
    Li, Linlin
    Chiu, Charles Y.
    Delwart, Eric L.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (07) : e46
  • [6] Compression of next-generation sequencing reads aided by highly efficient de novo assembly
    Jones, Daniel C.
    Ruzzo, Walter L.
    Peng, Xinxia
    Katze, Michael G.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (22)
  • [7] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genetic variation and the de novo assembly of human genomes
    Chaisson, Mark J. P.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2015, 16 (11) : 627 - 640
  • [8] Assembly algorithms for next-generation sequencing data
    Miller, Jason R.
    Koren, Sergey
    Sutton, Granger
    [J]. GENOMICS, 2010, 95 (06) : 315 - 327
  • [9] Next-generation transcriptome assembly
    Jeffrey A. Martin
    Zhong Wang
    [J]. Nature Reviews Genetics, 2011, 12 : 671 - 682
  • [10] Next-generation transcriptome assembly
    Martin, Jeffrey A.
    Wang, Zhong
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (10) : 671 - 682