Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data

被引:13
|
作者
Deng, Wenjiang [1 ]
Mou, Tian [1 ]
Kalari, Krishna R. [2 ]
Niu, Nifang [3 ]
Wang, Liewei [3 ]
Pawitan, Yudi [1 ]
Trung Nghia Vu [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden
[2] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA
[3] Mayo Clin, Dept Mol Pharmacol & Expt Therapeut, Rochester, MN 55905 USA
基金
瑞典研究理事会;
关键词
EXPRESSION; ALIGNMENT; KINASE; READS;
D O I
10.1093/bioinformatics/btz640
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
引用
收藏
页码:805 / 812
页数:8
相关论文
共 50 条
  • [1] Simultaneous Isoform Discovery and Quantification from RNA-Seq
    Hiller D.
    Wong W.H.
    Statistics in Biosciences, 2013, 5 (1) : 100 - 118
  • [2] Efficient RNA isoform identification and quantification from RNA-Seq data with network flows
    Bernard, Elsa
    Jacob, Laurent
    Mairal, Julien
    Vert, Jean-Philippe
    BIOINFORMATICS, 2014, 30 (17) : 2447 - 2455
  • [3] Towards Reliable Isoform Quantification Using RNA-Seq Data
    Howard, Brian E.
    Heber, Steffen
    2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 130 - 135
  • [4] Towards reliable isoform quantification using RNA-SEQ data
    Brian E Howard
    Steffen Heber
    BMC Bioinformatics, 11
  • [5] Towards reliable isoform quantification using RNA-SEQ data
    Howard, Brian E.
    Heber, Steffen
    BMC BIOINFORMATICS, 2010, 11
  • [6] A novel robust statistical method for isoform quantification from RNA-seq data
    Mondal, Pronoy K.
    Chatterjee, Raghunath
    Mukhopadhyay, Indranil
    GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 719 - 719
  • [7] WemIQ: an accurate and robust isoform quantification method for RNA-seq data
    Zhang, Jing
    Kuo, C. -C. Jay
    Chen, Liang
    BIOINFORMATICS, 2015, 31 (06) : 878 - 885
  • [8] RISQ: A novel robust statistical approach for isoform quantification from RNA-seq data
    Mondal, Pronoy Kanti
    Chatterjee, Raghunath
    Mukhopadhyay, Indranil
    HUMAN GENOMICS, 2018, 12
  • [9] IQML: A Robust Statistical Approach for Isoform Level Quantification from RNA-Seq Data
    Mondal, Pronoy Kanti
    Chatterjee, Raghunath
    Mukhopadhyay, Indranil
    GENETIC EPIDEMIOLOGY, 2016, 40 (07) : 653 - 654
  • [10] Quantification of mutant-allele expression at isoform level in cancer from RNA-seq data
    Deng, Wenjiang
    Mou, Tian
    Pawitan, Yudi
    Trung Nghia Vu
    NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (03)