Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data

被引:13
|
作者
Deng, Wenjiang [1 ]
Mou, Tian [1 ]
Kalari, Krishna R. [2 ]
Niu, Nifang [3 ]
Wang, Liewei [3 ]
Pawitan, Yudi [1 ]
Trung Nghia Vu [1 ]
机构
[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden
[2] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA
[3] Mayo Clin, Dept Mol Pharmacol & Expt Therapeut, Rochester, MN 55905 USA
基金
瑞典研究理事会;
关键词
EXPRESSION; ALIGNMENT; KINASE; READS;
D O I
10.1093/bioinformatics/btz640
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
引用
收藏
页码:805 / 812
页数:8
相关论文
共 50 条
  • [41] A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data
    Chen, Moliang
    Ji, Guoli
    Fu, Hongjuan
    Lin, Qianmin
    Ye, Congting
    Ye, Wenbin
    Su, Yaru
    Wu, Xiaohui
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (04) : 1261 - 1276
  • [42] Identifiability of isoform deconvolution from junction arrays and RNA-Seq
    Hiller, David
    Jiang, Hui
    Xu, Weihong
    Wong, Wing Hung
    BIOINFORMATICS, 2009, 25 (23) : 3056 - 3059
  • [43] Accurate quantification of transcriptome from RNA-Seq data by effective length normalization
    Lee, Soohyun
    Seo, Chae Hwa
    Lim, Byungho
    Yang, Jin Ok
    Oh, Jeongsu
    Kim, Minjin
    Lee, Sooncheol
    Lee, Byungwook
    Kang, Changwon
    Lee, Sanghyuk
    NUCLEIC ACIDS RESEARCH, 2011, 39 (02) : e9
  • [44] Impact of gene annotation choice on the quantification of RNA-seq data
    Chisanga, David
    Liao, Yang
    Shi, Wei
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [45] Dynamic Model for RNA-seq Data Analysis
    Li, Lerong
    Xiong, Momiao
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [46] Statistical inferences for isoform expression in RNA-Seq
    Jiang, Hui
    Wong, Wing Hung
    BIOINFORMATICS, 2009, 25 (08) : 1026 - 1032
  • [47] Impact of gene annotation choice on the quantification of RNA-seq data
    David Chisanga
    Yang Liao
    Wei Shi
    BMC Bioinformatics, 23
  • [48] Simulation-based benchmarking of isoform quantification in single-cell RNA-seq
    Westoby, Jennifer
    Herrera, Marcela Sjoberg
    Ferguson-Smith, Anne C.
    Hemberg, Martin
    GENOME BIOLOGY, 2018, 19
  • [49] Estimation of Isoform Expression using Hierarchical Bayesian Model by RNA-seq
    Wang, Zengmiao
    Wang, Jun
    Deng, Minghua
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 8554 - 8558
  • [50] Simulation-based benchmarking of isoform quantification in single-cell RNA-seq
    Jennifer Westoby
    Marcela Sjöberg Herrera
    Anne C. Ferguson-Smith
    Martin Hemberg
    Genome Biology, 19