Contrasting and combining transcriptome complexity captured by short and long RNA sequencing reads

被引:2
|
作者
Han, Seong Woo [1 ]
Jewell, San [2 ]
Thomas-Tikhonenko, Andrei [3 ,4 ]
Barash, Yoseph [1 ,2 ]
机构
[1] Univ Penn, Sch Engn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Dept Genet, Philadelphia, PA 19104 USA
[3] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[4] Childrens Hosp Philadelphia, Div Canc Pathobiol, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1101/gr.278659.123
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Mapping transcriptomic variations using either short- or long-read RNA sequencing is a staple of genomic research. Long reads are able to capture entire isoforms and overcome repetitive regions, whereas short reads still provide improved coverage and error rates. Yet, open questions remain, such as how to quantitatively compare the technologies, can we combine them, and what is the benefit of such a combined view? We tackle these questions by first creating a pipeline to assess matched long- and short-read data using a variety of transcriptome statistics. We find that across data sets, algorithms, and technologies, matched short-read data detects similar to 30% more splice junctions, such that similar to 10%-30% of the splice junctions included at >= 20% by short reads are missed by long reads. In contrast, long reads detect many more intron-retention events and can detect full isoforms, pointing to the benefit of combining the technologies. We introduce MAJIQ-L, an extension of the MAJIQ software, to enable a unified view of transcriptome variations from both technologies and demonstrate its benefits. Our software can be used to assess any future long-read technology or algorithm and can be combined with short-read data for improved transcriptome analysis.
引用
收藏
页码:1624 / 1635
页数:12
相关论文
共 50 条
  • [41] Magic-BLAST, an accurate RNA-seq aligner for long and short reads
    Grzegorz M. Boratyn
    Jean Thierry-Mieg
    Danielle Thierry-Mieg
    Ben Busby
    Thomas L. Madden
    BMC Bioinformatics, 20
  • [42] TransUFold: Unlocking the structural complexity of short and long RNA with pseudoknots
    Wang, Yunxiang
    Zhang, Hong
    Xu, Zhenchao
    Zhang, Shouhua
    Guo, Rui
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (11) : 19320 - 19340
  • [43] Advantages of long- and short-reads sequencing for the hybrid investigation of the Mycobacterium tuberculosis genome
    Di Marco, Federico
    Spitaleri, Andrea
    Battaglia, Simone
    Batignani, Virginia
    Cabibbe, Andrea Maurizio
    Cirillo, Daniela Maria
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [44] Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
    Othman Al-Dossary
    Agnelo Furtado
    Ardashir KharabianMasouleh
    Bader Alsubaie
    Ibrahim Al-Mssallem
    Robert J. Henry
    Plant Methods, 19
  • [45] Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows
    Al-Dossary, Othman
    Furtado, Agnelo
    KharabianMasouleh, Ardashir
    Alsubaie, Bader
    Al-Mssallem, Ibrahim
    Henry, Robert J.
    PLANT METHODS, 2023, 19 (01)
  • [46] High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing
    Cho, Hyunghoon
    Davis, Joe
    Li, Xin
    Smith, Kevin S.
    Battle, Alexis
    Montgomery, Stephen B.
    PLOS ONE, 2014, 9 (09):
  • [47] Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing
    Wang, Bo
    Tseng, Elizabeth
    Regulski, Michael
    Clark, Tyson A.
    Hon, Ting
    Jiao, Yinping
    Lu, Zhenyuan
    Olson, Andrew
    Stein, Joshua C.
    Ware, Doreen
    NATURE COMMUNICATIONS, 2016, 7
  • [48] Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing
    Bo Wang
    Elizabeth Tseng
    Michael Regulski
    Tyson A Clark
    Ting Hon
    Yinping Jiao
    Zhenyuan Lu
    Andrew Olson
    Joshua C. Stein
    Doreen Ware
    Nature Communications, 7
  • [49] RNA SEQUENCING OF STRIATAL AND WHOLE BRAIN TRANSCRIPTOME: DIFFERENCES BETWEEN ALCOHOL SENSITIVE LONG SLEEP AND SHORT SLEEP MICE STRAINS
    Darlington, T. M.
    Ehringer, M. A.
    Larson, C.
    Radcliffe, R. A.
    ALCOHOLISM-CLINICAL AND EXPERIMENTAL RESEARCH, 2011, 35 (06) : 257A - 257A
  • [50] SOLUTION FOR UNRESOLVABLE NEW ALLELES: COMBINING THE ADVANTAGES OF NGS LONG AND SHORT READS FOR HLA AND KIR TYPING
    Rozemuller, Erik H.
    van Deutekom, Hanneke
    Bouwmans, Evelien E.
    van de Pasch, Loes A. L.
    Penning, Maarten T.
    HLA, 2019, 93 (05) : 375 - 375