Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data

被引：13

作者：

Deng, Wenjiang ^{[1
]}

Mou, Tian ^{[1
]}

Kalari, Krishna R. ^{[2
]}

Niu, Nifang ^{[3
]}

Wang, Liewei ^{[3
]}

Pawitan, Yudi ^{[1
]}

Trung Nghia Vu ^{[1
]}

机构：

[1] Karolinska Inst, Dept Med Epidemiol & Biostat, S-17177 Stockholm, Sweden

[2] Mayo Clin, Dept Hlth Sci Res, Rochester, MN 55905 USA

[3] Mayo Clin, Dept Mol Pharmacol & Expt Therapeut, Rochester, MN 55905 USA

来源：

BIOINFORMATICS | 2020年 / 36卷 / 03期

基金：

瑞典研究理事会;

关键词：

EXPRESSION; ALIGNMENT; KINASE; READS;

D O I：

10.1093/bioinformatics/btz640

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. Results: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model X beta, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers X beta as a bilinear model with both X and beta unknown. Joint estimation of X and beta is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and beta. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.

引用

页码：805 / 812

页数：8

共 50 条

[31] Acfs: accurate circRNA identification and quantification from RNA-Seq data
Xintian You
Tim OF Conrad
Scientific Reports, 6
[32] Transcriptome assembly and quantification from Ion Torrent RNA-Seq data
Serghei Mangul
Adrian Caciula
Sahar Al Seesi
Dumitru Brinza
Ion Mӑndoiu
Alex Zelikovsky
BMC Genomics, 15
[33] A Robust Method for Transcript Quantification with RNA-Seq Data
Huang, Yan
Hu, Yin
Jones, Corbin D.
MacLeod, James N.
Chiang, Derek Y.
Liu, Yufeng
Prins, Jan F.
Liu, Jinze
JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (03) : 167 - 187
[34] An Efficient Algorithm for Sensitively Detecting Circular RNA from RNA-seq Data
Zhang, Xuanping
Wang, Yidan
Zhao, Zhongmeng
Wang, Jiayin
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (10)
[35] Fast RNA-seq quantification
Nature Methods, 2016, 13 (6) : 470 - 470
[36] MSIQ: JOINT MODELING OF MULTIPLE RNA-SEQ SAMPLES FOR ACCURATE ISOFORM QUANTIFICATION
Li, Wei Vivian
Zhao, Anqi
Zhang, Shihua
Li, Jingyi Jessica
ANNALS OF APPLIED STATISTICS, 2018, 12 (01): : 510 - 539
[37] IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data
Liang Niu
Weichun Huang
David M Umbach
Leping Li
BMC Genomics, 15
[38] IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data
Niu, Liang
Huang, Weichun
Umbach, David M.
Li, Leping
BMC GENOMICS, 2014, 15
[39] Statistical modeling of isoform splicing dynamics from RNA-seq time series data
Huang, Yuanhua
Sanguinetti, Guido
BIOINFORMATICS, 2016, 32 (19) : 2965 - 2972
[40] APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data
Ye, Congting
Long, Yuqi
Ji, Guoli
Li, Qingshun Quinn
Wu, Xiaohui
BIOINFORMATICS, 2018, 34 (11) : 1841 - 1849

← 1 2 3 4 5 →