Estimation of alternative splicing isoform frequencies from RNA-Seq data

被引:93
|
作者
Nicolae, Marius [1 ]
Mangul, Serghei [2 ]
Mandoiu, Ion I. [1 ]
Zelikovsky, Alex [2 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
来源
基金
美国国家科学基金会;
关键词
SHORT SEQUENCE READS; EXPRESSION LEVELS; GENE-EXPRESSION; TRANSCRIPTOME; QUANTIFICATION; RECONSTRUCTION; REVEALS; GENOME;
D O I
10.1186/1748-7188-6-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. Results: In this paper we present a novel expectation-maximization algorithm for inference of isoform-and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/. Conclusions: Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.
引用
下载
收藏
页数:13
相关论文
共 50 条
  • [41] Simultaneous Isoform Discovery and Quantification from RNA-Seq
    Hiller D.
    Wong W.H.
    Statistics in Biosciences, 2013, 5 (1) : 100 - 118
  • [42] SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts
    Ryan, Michael C.
    Cleland, James
    Kim, RyangGuk
    Wong, Wing Chung
    Weinstein, John N.
    BIOINFORMATICS, 2012, 28 (18) : 2385 - 2387
  • [43] Towards Reliable Isoform Quantification Using RNA-Seq Data
    Howard, Brian E.
    Heber, Steffen
    2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 130 - 135
  • [44] Integrative analysis of many RNA-seq datasets to study alternative splicing
    Li, Wenyuan
    Dai, Chao
    Kang, Shuli
    Zhou, Xianghong Jasmine
    METHODS, 2014, 67 (03) : 313 - 324
  • [45] GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data
    Keyan Zhao
    Zhi-xiang Lu
    Juw Won Park
    Qing Zhou
    Yi Xing
    Genome Biology, 14
  • [46] Towards reliable isoform quantification using RNA-SEQ data
    Brian E Howard
    Steffen Heber
    BMC Bioinformatics, 11
  • [47] GLiMMPS: robust statistical model for regulatory variation of alternative splicing using RNA-seq data
    Zhao, Keyan
    Lu, Zhi-xiang
    Park, Juw Won
    Zhou, Qing
    Xing, Yi
    GENOME BIOLOGY, 2013, 14 (07):
  • [48] Towards reliable isoform quantification using RNA-SEQ data
    Howard, Brian E.
    Heber, Steffen
    BMC BIOINFORMATICS, 2010, 11
  • [49] 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists
    Guo, Wenbin
    Tzioutziou, Nikoleta A.
    Stephen, Gordon
    Milne, Iain
    Calixto, Cristiane P. G.
    Waugh, Robbie
    Brown, John W. S.
    Zhang, Runxuan
    RNA BIOLOGY, 2021, 18 (11) : 1574 - 1587
  • [50] Log-Sum Heuristic Recovery for Automated Isoform Discovery and Abundance Estimation from RNA-Seq Data
    Yang, Yang
    Deng, Yue
    Ji, Xiangyang
    Dai, Qionghai
    2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2015, : 599 - 603