TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads

被引:28
|
作者
Nariai, Naoki [1 ]
Kojima, Kaname [1 ]
Mimori, Takahiro [1 ]
Sato, Yukuto [1 ]
Kawai, Yosuke [1 ]
Yamaguchi-Kabata, Yumi [1 ]
Nagasaki, Masao [1 ]
机构
[1] Tohoku Univ, Tohoku Med Megabank Org, Dept Integrat Genom, Aoba Ku, Sendai, Miyagi 9808573, Japan
来源
BMC GENOMICS | 2014年 / 15卷
关键词
REFERENCE GENOME; ALIGNMENT; GENE; QUANTIFICATION; REVEALS;
D O I
10.1186/1471-2164-15-S10-S5
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: High-throughput RNA sequencing (RNA-Seq) enables quantification and identification of transcripts at single-base resolution. Recently, longer sequence reads become available thanks to the development of new types of sequencing technologies as well as improvements in chemical reagents for the Next Generation Sequencers. Although several computational methods have been proposed for quantifying gene expression levels from RNA-Seq data, they are not sufficiently optimized for longer reads (e.g. > 250 bp). Results: We propose TIGAR2, a statistical method for quantifying transcript isoforms from fixed and variable length RNA-Seq data. Our method models substitution, deletion, and insertion errors of sequencers based on gapped-alignments of reads to the reference cDNA sequences so that sensitive read-aligners such as Bowtie2 and BWA-MEM are effectively incorporated in our pipeline. Also, a heuristic algorithm is implemented in variational Bayesian inference for faster computation. We apply TIGAR2 to both simulation data and real data of human samples and evaluate performance of transcript quantification with TIGAR2 in comparison to existing methods. Conclusions: TIGAR2 is a sensitive and accurate tool for quantifying transcript isoform abundances from RNA-Seq data. Our method performs better than existing methods for the fixed-length reads (100 bp, 250 bp, 500 bp, and 1000 bp of both single-end and paired-end) and variable-length reads, especially for reads longer than 250 bp.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
    Li, Bo
    Dewey, Colin N.
    BMC BIOINFORMATICS, 2011, 12
  • [42] RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
    Bo Li
    Colin N Dewey
    BMC Bioinformatics, 12
  • [43] iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data
    Mezlini, Aziz M.
    Smith, Eric J. M.
    Fiume, Marc
    Buske, Orion
    Savich, Gleb L.
    Shah, Sohrab
    Aparicio, Sam
    Chiang, Derek Y.
    Goldenberg, Anna
    Brudno, Michael
    GENOME RESEARCH, 2013, 23 (03) : 519 - 529
  • [44] Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
    Cole Trapnell
    Adam Roberts
    Loyal Goff
    Geo Pertea
    Daehwan Kim
    David R Kelley
    Harold Pimentel
    Steven L Salzberg
    John L Rinn
    Lior Pachter
    Nature Protocols, 2012, 7 : 562 - 578
  • [45] Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
    Trapnell, Cole
    Roberts, Adam
    Goff, Loyal
    Pertea, Geo
    Kim, Daehwan
    Kelley, David R.
    Pimentel, Harold
    Salzberg, Steven L.
    Rinn, John L.
    Pachter, Lior
    NATURE PROTOCOLS, 2012, 7 (03) : 562 - 578
  • [46] RNA-Seq for Enrichment and Analysis of IRF5 Transcript Expression in SLE
    Stone, Rivka C.
    Du, Peicheng
    Feng, Di
    Dhawan, Kopal
    Ronnblom, Lars
    Eloranta, Maija-Leena
    Donnelly, Robert
    Barnes, Betsy J.
    PLOS ONE, 2013, 8 (01):
  • [47] Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression
    Raghupathy, Narayanan
    Choi, Kwangbom
    Vincent, Matthew J.
    Beane, Glen L.
    Sheppard, Keith S.
    Munger, Steven C.
    Korstanje, Ron
    Pardo-Manual de Villena, Fernando
    Churchill, Gary A.
    BIOINFORMATICS, 2018, 34 (13) : 2177 - 2184
  • [48] Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate
    Liu, Xuejun
    Shi, Xinxin
    Chen, Chunlin
    Zhang, Li
    BMC BIOINFORMATICS, 2015, 16
  • [49] Immunoglobulin transcript sequence and somatic hypermutation computation from unselected RNA-seq reads in chronic lymphocytic leukemia
    Blachly, James S.
    Ruppert, Amy S.
    Zhao, Weiqiang
    Long, Susan
    Flynn, Joseph
    Flinn, Ian
    Jones, Jeffrey
    Maddocks, Kami
    Andritsos, Leslie
    Ghia, Emanuela M.
    Rassenti, Laura Z.
    Kipps, Thomas J.
    de la Chapelle, Albert
    Byrd, John C.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (14) : 4322 - 4327
  • [50] Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate
    Xuejun Liu
    Xinxin Shi
    Chunlin Chen
    Li Zhang
    BMC Bioinformatics, 16