Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments

被引:18
|
作者
Pasaniuc, Bogdan [1 ,2 ]
Zaitlen, Noah [1 ,2 ]
Halperin, Eran [3 ,4 ,5 ]
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
[4] Tel Aviv Univ, Mol Microbiol & Biotechnol Dept, IL-69978 Tel Aviv, Israel
[5] Tel Aviv Univ, Blavatnik Sch Comp Sci, IL-69978 Tel Aviv, Israel
基金
以色列科学基金会; 美国国家科学基金会;
关键词
algorithms; gene searching; genetic mapping; genetic variation; TRANSCRIPTOMES; REVEALS; GENOME; MOUSE;
D O I
10.1089/cmb.2010.0259
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Next generation high-throughput sequencing (NGS) is poised to replace array-based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions of short sequences (reads) are sampled from RNA extracts and mapped back to a reference genome. The number of reads mapping to each gene is used as proxy for its corresponding RNA concentration. A significant challenge in analyzing RNA expression of homologous genes is the large fraction of the reads that map to multiple locations in the reference genome. Currently, these reads are either dropped from the analysis, or a naive algorithm is used to estimate their underlying distribution. In this work, we present a rigorous alternative for handling the reads generated in an RNA-seq experiment within a probabilistic model for RNA-seq data; we develop maximum likelihood-based methods for estimating the model parameters. In contrast to previous methods, our model takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence. We show with both simulated and real RNA-seq data that our new method improves the accuracy and power of RNA-seq experiments.
引用
收藏
页码:459 / 468
页数:10
相关论文
共 50 条
  • [41] Estimation of gene co-expression from RNA-Seq count data
    Specht, Alicia T.
    Li, Jun
    STATISTICS AND ITS INTERFACE, 2015, 8 (04) : 507 - 515
  • [42] Estimation of isoform expression in RNA-seq data using a hierarchical Bayesian model
    Wang, Zengmiao
    Wang, Jun
    Wu, Changjing
    Deng, Minghua
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2015, 13 (06)
  • [43] Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability
    Uziela, Karolis
    Honkela, Antti
    PLOS ONE, 2015, 10 (05):
  • [44] Effect of method of deduplication on estimation of differential gene expression using RNA-seq
    klepikova, Anna V.
    Kasianov, Artem S.
    Chesnokov, Mikhail S.
    Lazarevich, Natalia L.
    Penin, Aleksey A.
    Logacheva, Maria
    PEERJ, 2017, 5
  • [45] Synthetic spike-in standards for RNA-seq experiments
    Jiang, Lichun
    Schlesinger, Felix
    Davis, Carrie A.
    Zhang, Yu
    Li, Renhua
    Salit, Marc
    Gingeras, Thomas R.
    Oliver, Brian
    GENOME RESEARCH, 2011, 21 (09) : 1543 - 1551
  • [46] Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression
    Busby, Michele A.
    Stewart, Chip
    Miller, Chase A.
    Grzeda, Krzysztof R.
    Marth, Gabor T.
    BIOINFORMATICS, 2013, 29 (05) : 656 - 657
  • [47] Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
    Pertea, Mihaela
    Kim, Daehwan
    Pertea, Geo M.
    Leek, Jeffrey T.
    Salzberg, Steven L.
    NATURE PROTOCOLS, 2016, 11 (09) : 1650 - 1667
  • [48] Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes
    Lithio, Andrew
    Nettleton, Dan
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2015, 20 (04) : 598 - 613
  • [49] Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes
    Andrew Lithio
    Dan Nettleton
    Journal of Agricultural, Biological, and Environmental Statistics, 2015, 20 : 598 - 613
  • [50] Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
    Mihaela Pertea
    Daehwan Kim
    Geo M Pertea
    Jeffrey T Leek
    Steven L Salzberg
    Nature Protocols, 2016, 11 : 1650 - 1667