A probabilistic framework for aligning paired-end RNA-seq data

被引:14
|
作者
Hu, Yin [1 ]
Wang, Kai [1 ]
He, Xiaping [2 ]
Chiang, Derek Y. [2 ]
Prins, Jan F. [3 ]
Liu, Jinze [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] Univ N Carolina, Dept Genet, Chapel Hill, NC USA
[3] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btq336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment. Methods: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. Results: The method was applied to 2x35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).
引用
收藏
页码:1950 / 1957
页数:8
相关论文
共 50 条
  • [21] FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data
    Sboner, Andrea
    Habegger, Lukas
    Pflueger, Dorothee
    Terry, Stephane
    Chen, David Z.
    Rozowsky, Joel S.
    Tewari, Ashutosh K.
    Kitabayashi, Naoki
    Moss, Benjamin J.
    Chee, Mark S.
    Demichelis, Francesca
    Rubin, Mark A.
    Gerstein, Mark B.
    GENOME BIOLOGY, 2010, 11 (10):
  • [22] FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data
    Andrea Sboner
    Lukas Habegger
    Dorothee Pflueger
    Stephane Terry
    David Z Chen
    Joel S Rozowsky
    Ashutosh K Tewari
    Naoki Kitabayashi
    Benjamin J Moss
    Mark S Chee
    Francesca Demichelis
    Mark A Rubin
    Mark B Gerstein
    Genome Biology, 11
  • [23] FusionSeq: A Modular Framework for Finding Gene Fusions by Analyzing Paired-End RNA-Sequencing Data
    Sboner, A.
    Habegger, L.
    Pflueger, D.
    Terry, S.
    Chen, D. Z.
    Tewari, A. K.
    Kitabayashi, N.
    Moss, B. J.
    Chee, M. S.
    Demichelis, F.
    Rubin, M. A.
    Gerstein, M. B.
    MODERN PATHOLOGY, 2010, 23 : 427A - 428A
  • [24] FusionSeq: A Modular Framework for Finding Gene Fusions by Analyzing Paired-End RNA-Sequencing Data
    Sboner, A.
    Habegger, L.
    Pflueger, D.
    Terry, S.
    Chen, D. Z.
    Tewari, A. K.
    Kitabayashi, N.
    Moss, B. J.
    Chee, M. S.
    Demichelis, F.
    Rubin, M. A.
    Gerstein, M. B.
    LABORATORY INVESTIGATION, 2010, 90 : 427A - 428A
  • [25] A Framework for Comparison and Assessment of Synthetic RNA-Seq Data
    Shakola, Felitsiya
    Palejev, Dean
    Ivanov, Ivan
    GENES, 2022, 13 (12)
  • [26] QUANTIFYING ALTERNATIVE SPLICING FROM PAIRED-END RNA-SEQUENCING DATA
    Rossell, David
    Attolini, Camille Stephan-Otto
    Kroiss, Manuel
    Stoecker, Almond
    ANNALS OF APPLIED STATISTICS, 2014, 8 (01): : 309 - 330
  • [27] A Statistical Framework for eQTL Mapping Using RNA-seq Data
    Sun, Wei
    BIOMETRICS, 2012, 68 (01) : 1 - 11
  • [28] A Phylogenetic Framework to Simulate Synthetic Interspecies RNA-Seq Data
    Bastide, Paul
    Soneson, Charlotte
    Stern, David B.
    Lespinet, Olivier
    Gallopin, Melina
    MOLECULAR BIOLOGY AND EVOLUTION, 2023, 40 (01)
  • [29] CAFU: a Galaxy framework for exploring unmapped RNA-Seq data
    Chen, Siyuan
    Ren, Chengzhi
    Zhai, Jingjing
    Yu, Jiantao
    Zhao, Xuyang
    Li, Zelong
    Zhang, Ting
    Ma, Wenlong
    Han, Zhaoxue
    Ma, Chuang
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (02) : 676 - 686
  • [30] The Drosophila melanogaster transcriptome by paired-end RNA sequencing
    Daines, Bryce
    Wang, Hui
    Wang, Liguo
    Li, Yumei
    Han, Yi
    Emmert, David
    Gelbart, William
    Wang, Xia
    Li, Wei
    Gibbs, Richard
    Chen, Rui
    GENOME RESEARCH, 2011, 21 (02) : 315 - 324