A probabilistic framework for aligning paired-end RNA-seq data

被引:14
|
作者
Hu, Yin [1 ]
Wang, Kai [1 ]
He, Xiaping [2 ]
Chiang, Derek Y. [2 ]
Prins, Jan F. [3 ]
Liu, Jinze [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] Univ N Carolina, Dept Genet, Chapel Hill, NC USA
[3] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btq336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment. Methods: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. Results: The method was applied to 2x35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT-PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).
引用
收藏
页码:1950 / 1957
页数:8
相关论文
共 50 条
  • [1] On de novo Bridging Paired-end RNA-seq Data
    Li, Xiang
    Shao, Mingfu
    14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023, 2023,
  • [2] Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript
    Benelli, Matteo
    Pescucci, Chiara
    Marseglia, Giuseppina
    Severgnini, Marco
    Torricelli, Francesca
    Magi, Alberto
    BIOINFORMATICS, 2012, 28 (24) : 3232 - 3239
  • [3] Detection of splice junctions from paired-end RNA-seq data by SpliceMap
    Au, Kin Fai
    Jiang, Hui
    Lin, Lan
    Xing, Yi
    Wong, Wing Hung
    NUCLEIC ACIDS RESEARCH, 2010, 38 (14) : 4570 - 4578
  • [4] A fast detection of fusion genes from paired-end RNA-seq data
    Trung Nghia Vu
    Deng, Wenjiang
    Quang Thinh Trac
    Calza, Stefano
    Hwang, Woochang
    Pawitan, Yudi
    BMC GENOMICS, 2018, 19
  • [5] A fast detection of fusion genes from paired-end RNA-seq data
    Trung Nghia Vu
    Wenjiang Deng
    Quang Thinh Trac
    Stefano Calza
    Woochang Hwang
    Yudi Pawitan
    BMC Genomics, 19
  • [6] SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data
    Wenlong Jia
    Kunlong Qiu
    Minghui He
    Pengfei Song
    Quan Zhou
    Feng Zhou
    Yuan Yu
    Dandan Zhu
    Michael L Nickerson
    Shengqing Wan
    Xiangke Liao
    Xiaoqian Zhu
    Shaoliang Peng
    Yingrui Li
    Jun Wang
    Guangwu Guo
    Genome Biology, 14
  • [7] SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data
    Jia, Wenlong
    Qiu, Kunlong
    He, Minghui
    Song, Pengfei
    Zhou, Quan
    Zhou, Feng
    Yu, Yuan
    Zhu, Dandan
    Nickerson, Michael L.
    Wan, Shengqing
    Liao, Xiangke
    Zhu, Xiaoqian
    Peng, Shaoliang
    Li, Yingrui
    Wang, Jun
    Guo, Guangwu
    GENOME BIOLOGY, 2013, 14 (02):
  • [8] A NOVEL ANALYSIS FLOW FOR FUSED TRANSCRIPTS DISCOVERY FROM PAIRED-END RNA-SEQ DATA
    Abate, F.
    Paciello, G.
    Acquaviva, A.
    Ficarra, E.
    Ferrarini, A.
    Delledonne, M.
    Macii, Ande.
    BIOINFORMATICS: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2012, : 331 - 334
  • [9] FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq
    Li, Yang
    Chien, Jeremy
    Smith, David I.
    Ma, Jian
    BIOINFORMATICS, 2011, 27 (12) : 1708 - 1710
  • [10] TRIP: a method for novel transcript reconstruction from paired-end RNA-seq reads
    Serghei Mangul
    Adrian Caciula
    Dumitru Brinza
    Ion I Mandoiu
    Alex Zelikovsky
    BMC Bioinformatics, 13 (Suppl 18)