A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

被引:3
|
作者
Sze, Sing-Hoi [1 ,2 ]
Tarone, Aaron M. [3 ]
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
[2] Texas A&M Univ, Dept Biochem & Biophys, College Stn, TX 77843 USA
[3] Texas A&M Univ, Dept Entomol, College Stn, TX 77843 USA
来源
BMC GENOMICS | 2014年 / 15卷
关键词
Alternative Splice; Transcriptome Assembly; Forward Node; Postprocessing Algorithm; Differential Expression Level;
D O I
10.1186/1471-2164-15-S5-S6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph. Results: Since the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS. Conclusions: Since our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A memory-efficient algorithm to obtain splicing graphs and de novoexpression estimates from de Bruijn graphs of RNA-Seq data
    Sing-Hoi Sze
    Aaron M Tarone
    [J]. BMC Genomics, 15
  • [2] Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data
    Fu, Shuhua
    Tarone, Aaron M.
    Sze, Sing-Hoi
    [J]. BMC GENOMICS, 2015, 16
  • [3] Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era
    Rizzi, Raffaella
    Beretta, Stefano
    Patterson, Murray
    Pirola, Yuri
    Previtali, Marco
    Della Vedova, Gianluca
    Bonizzoni, Paola
    [J]. QUANTITATIVE BIOLOGY, 2019, 7 (04) : 278 - 292
  • [4] Heuristic Pairwise Alignment of de Bruijn Graphs to Facilitate Simultaneous Transcript Discovery in Related Organisms from RNA-Seq Data
    Fu, Shuhua
    Tarone, Aaron M.
    Sze, Sing-Hoi
    [J]. 2014 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2014,
  • [5] Heuristic pairwise alignment of de Bruijn graphs to facilitate simultaneous transcript discovery in related organisms from RNA-Seq data
    Shuhua Fu
    Aaron M Tarone
    Sing-Hoi Sze
    [J]. BMC Genomics, 16
  • [6] Overlap graphs and de Bruijn graphs:data structures for de novo genome assembly in the big data era
    Raffaella Rizzi
    Stefano Beretta
    Murray Patterson
    Yuri Pirola
    Marco Previtali
    Gianluca Della Vedova
    Paola Bonizzoni
    [J]. Quantitative Biology, 2019, 7 (04) : 278 - 292
  • [7] KISSPLICE: de-novo calling alternative splicing events from RNA-seq data
    Sacomoto, Gustavo A. T.
    Kielbassa, Janice
    Chikhi, Rayan
    Uricaru, Raluca
    Antoniou, Pavlos
    Sagot, Marie-France
    Peterlongo, Pierre
    Lacroix, Vincent
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [8] Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs
    LeGault, Laura H.
    Dewey, Colin N.
    [J]. BIOINFORMATICS, 2013, 29 (18) : 2300 - 2310
  • [9] Modeling Alternative Splicing Variants from RNA-Seq Data with Isoform Graphs
    Beretta, Stefano
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Pirola, Yuri
    Rizzi, Raffaella
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (01) : 16 - 40
  • [10] De novo assembly and analysis of RNA-seq data
    Robertson, Gordon
    Schein, Jacqueline
    Chiu, Readman
    Corbett, Richard
    Field, Matthew
    Jackman, Shaun D.
    Mungall, Karen
    Lee, Sam
    Okada, Hisanaga Mark
    Qian, Jenny Q.
    Griffith, Malachi
    Raymond, Anthony
    Thiessen, Nina
    Cezard, Timothee
    Butterfield, Yaron S.
    Newsome, Richard
    Chan, Simon K.
    She, Rong
    Varhol, Richard
    Kamoh, Baljit
    Prabhu, Anna-Liisa
    Tam, Angela
    Zhao, YongJun
    Moore, Richard A.
    Hirst, Martin
    Marra, Marco A.
    Jones, Steven J. M.
    Hoodless, Pamela A.
    Birol, Inanc
    [J]. NATURE METHODS, 2010, 7 (11) : 909 - U62