A probabilistic framework for aligning paired-end RNA-seq data

被引:14
|
作者
Hu, Yin [1 ]
Wang, Kai [1 ]
He, Xiaping [2 ]
Chiang, Derek Y. [2 ]
Prins, Jan F. [3 ]
Liu, Jinze [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] Univ N Carolina, Dept Genet, Chapel Hill, NC USA
[3] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btq336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment. Methods: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. Results: The method was applied to 2x35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).
引用
收藏
页码:1950 / 1957
页数:8
相关论文
共 50 条
  • [31] OMICfpp: a fuzzy approach for paired RNA-Seq counts
    Berral-Gonzalez, Alberto
    Riffo-Campos, Angela L.
    Ayala, Guillermo
    BMC GENOMICS, 2019, 20 (1)
  • [32] OMICfpp: a fuzzy approach for paired RNA-Seq counts
    Alberto Berral-Gonzalez
    Angela L. Riffo-Campos
    Guillermo Ayala
    BMC Genomics, 20
  • [33] Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs
    LeGault, Laura H.
    Dewey, Colin N.
    BIOINFORMATICS, 2013, 29 (18) : 2300 - 2310
  • [34] ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events
    Luca Denti
    Raffaella Rizzi
    Stefano Beretta
    Gianluca Della Vedova
    Marco Previtali
    Paola Bonizzoni
    BMC Bioinformatics, 19
  • [35] ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events
    Denti, Luca
    Rizzi, Raffaella
    Beretta, Stefano
    Della Vedova, Gianluca
    Previtali, Marco
    Bonizzoni, Paola
    BMC BIOINFORMATICS, 2018, 19
  • [36] Near-optimal probabilistic RNA-seq quantification
    Nicolas L Bray
    Harold Pimentel
    Páll Melsted
    Lior Pachter
    Nature Biotechnology, 2016, 34 : 525 - 527
  • [37] Dimensionality Reduction of RNA-Seq Data
    Al-Turaiki, Isra
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (03): : 31 - 36
  • [38] Near-optimal probabilistic RNA-seq quantification
    Bray, Nicolas L.
    Pimentel, Harold
    Melsted, Pall
    Pachter, Lior
    NATURE BIOTECHNOLOGY, 2016, 34 (05) : 525 - 527
  • [39] Transcript quantification with RNA-Seq data
    Bohnert, Regina
    Behr, Jonas
    Raetsch, Gunnar
    BMC BIOINFORMATICS, 2009, 10 : P5
  • [40] Statistical Modeling of RNA-Seq Data
    Salzman, Julia
    Jiang, Hui
    Wong, Wing Hung
    STATISTICAL SCIENCE, 2011, 26 (01) : 62 - 83