Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq

被引:76
|
作者
Wu, Zhengpeng
Wang, Xi
Zhang, Xuegong [1 ]
机构
[1] Tsinghua Univ, TNLIST Dept Automat, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
MESSENGER-RNA; TRANSCRIPTOME; DISEASE; PARKIN; CHIP;
D O I
10.1093/bioinformatics/btq696
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: RNA-Seq technology based on next-generation sequencing provides the unprecedented ability of studying transcriptomes at high resolution and accuracy, and the potential of measuring expression of multiple isoforms from the same gene at high precision. Solved by maximum likelihood estimation, isoform expression can be inferred in RNA-Seq using statistical models based on the assumption that sequenced reads are distributed uniformly along transcripts. Modification of the model is needed when considering situations where RNA-Seq data do not follow uniform distribution. Results: We proposed two curves, the global bias curve (GBC) and the local bias curves (LBCs), to describe the non-uniformity of read distributions for all genes in a transcriptome and for each gene, respectively. Incorporating the bias curves into the uniform read distribution (URD) model, we introduced non-URD (N-URD) models to infer isoform expression levels. On a series of systematic simulation studies, the proposed models outperform the original model in recovering major isoforms and the expression ratio of alternative isoforms. We also applied the new model to real RNA-Seq datasets and found that its inferences on expression ratios of alternative isoforms are more reasonable. The experiments indicate that incorporating N-URD information can improve the accuracy in modeling and inferring isoform expression in RNA-Seq.
引用
收藏
页码:502 / 508
页数:7
相关论文
共 50 条
  • [41] Isoform-level microRNA-155 target prediction using RNA-seq
    Deng, Nan
    Puetter, Adriane
    Zhang, Kun
    Johnson, Kristen
    Zhao, Zhiyu
    Taylor, Christopher
    Flemington, Erik K.
    Zhu, Dongxiao
    NUCLEIC ACIDS RESEARCH, 2011, 39 (09) : e61
  • [42] Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching
    Rehrauer, Hubert
    Opitz, Lennart
    Tan, Ge
    Sieverling, Lina
    Schlapbach, Ralph
    BMC BIOINFORMATICS, 2013, 14
  • [43] Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data
    Wood, David L. A.
    Nones, Katia
    Steptoe, Anita
    Christ, Angelika
    Harliwong, Ivon
    Newell, Felicity
    Bruxner, Timothy J. C.
    Miller, David
    Cloonan, Nicole
    Grimmond, Sean M.
    PLOS ONE, 2015, 10 (05):
  • [44] Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching
    Hubert Rehrauer
    Lennart Opitz
    Ge Tan
    Lina Sieverling
    Ralph Schlapbach
    BMC Bioinformatics, 14
  • [45] Parseq: reconstruction of microbial transcription landscape from RNA-Seq read counts using state-space models
    Mirauta, Bogdan
    Nicolas, Pierre
    Richard, Hugues
    BIOINFORMATICS, 2014, 30 (10) : 1409 - 1416
  • [46] Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs
    Kinsella, Marcus
    Harismendy, Olivier
    Nakano, Masakazu
    Frazer, Kelly A.
    Bafna, Vineet
    BIOINFORMATICS, 2011, 27 (08) : 1068 - 1075
  • [47] Combining Multiple RNA-Seq Data Analysis Algorithms Using Machine Learning Improves Differential Isoform Expression Analysis
    Dimopoulos, Alexandros C.
    Koukoutegos, Konstantinos
    Psomopoulos, Fotis E.
    Moulos, Panagiotis
    METHODS AND PROTOCOLS, 2021, 4 (04)
  • [48] DECONVOLUTION OF BASE PAIR LEVEL RNA-SEQ READ COUNTS FOR QUANTIFICATION OF TRANSCRIPT EXPRESSION LEVELS
    Wu, Han
    Zhu, Yu
    ANNALS OF APPLIED STATISTICS, 2016, 10 (03): : 1195 - 1216
  • [49] Bias Correction in RNA-Seq Short-Read Counts Using Penalized Regression
    Dalpiaz D.
    He X.
    Ma P.
    Statistics in Biosciences, 2013, 5 (1) : 88 - 99
  • [50] PM-Seq: Using Finite Poisson Mixture Models for RNA-Seq Data Analysis and Transcript Expression Level Quantification
    Wu H.
    Qin Z.
    Zhu Y.
    Statistics in Biosciences, 2013, 5 (1) : 71 - 87