Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data

被引:13
|
作者
Deng, Wenjiang [1 ]
Mou, Tian [1 ]
Kalari, Krishna R. [2 ]
Niu, Nifang [3 ]
Wang, Liewei [3 ]
Pawitan, Yudi [1 ]
Trung Nghia Vu [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden
[2] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA
[3] Mayo Clin, Dept Mol Pharmacol & Expt Therapeut, Rochester, MN 55905 USA
基金
瑞典研究理事会;
关键词
EXPRESSION; ALIGNMENT; KINASE; READS;
D O I
10.1093/bioinformatics/btz640
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
引用
收藏
页码:805 / 812
页数:8
相关论文
共 50 条
  • [21] A convex formulation for joint RNA isoform detection and quantification from multiple RNA-seq samples
    Elsa Bernard
    Laurent Jacob
    Julien Mairal
    Eric Viara
    Jean-Philippe Vert
    BMC Bioinformatics, 16
  • [22] Estimation of Alternative Splicing isoform Frequencies from RNA-Seq Data
    Nicolae, Marius
    Mangul, Serghei
    Mandoiu, Ion
    Zelikovsky, Alex
    ALGORITHMS IN BIOINFORMATICS, 2010, 6293 : 202 - +
  • [23] Estimation of alternative splicing isoform frequencies from RNA-Seq data
    Nicolae, Marius
    Mangul, Serghei
    Mandoiu, Ion I.
    Zelikovsky, Alex
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
  • [24] Prediction and Quantification of Splice Events from RNA-Seq Data
    Goldstein, Leonard D.
    Cao, Yi
    Pau, Gregoire
    Lawrence, Michael
    Wu, Thomas D.
    Seshagiri, Somasekar
    Gentleman, Robert
    PLOS ONE, 2016, 11 (05):
  • [25] Estimation of isoform expression in RNA-seq data using a hierarchical Bayesian model
    Wang, Zengmiao
    Wang, Jun
    Wu, Changjing
    Deng, Minghua
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2015, 13 (06)
  • [26] iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data
    Mezlini, Aziz M.
    Smith, Eric J. M.
    Fiume, Marc
    Buske, Orion
    Savich, Gleb L.
    Shah, Sohrab
    Aparicio, Sam
    Chiang, Derek Y.
    Goldenberg, Anna
    Brudno, Michael
    GENOME RESEARCH, 2013, 23 (03) : 519 - 529
  • [27] Modeling Alternative Splicing Variants from RNA-Seq Data with Isoform Graphs
    Beretta, Stefano
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Pirola, Yuri
    Rizzi, Raffaella
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2014, 21 (01) : 16 - 40
  • [28] Acfs: accurate circRNA identification and quantification from RNA-Seq data
    You, Xintian
    Conrad, Tim O. F.
    SCIENTIFIC REPORTS, 2016, 6
  • [29] Transcriptome assembly and quantification from Ion Torrent RNA-Seq data
    Mangul, Serghei
    Caciula, Adrian
    Al Seesi, Sahar
    Brinza, Dumitru
    Mondoiu, Ion
    Zelikovsky, Alex
    BMC GENOMICS, 2014, 15
  • [30] Quantification of co-transcriptional splicing from RNA-Seq data
    Herzel, Lydia
    Neugebauer, Karla M.
    METHODS, 2015, 85 : 36 - 43