mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data

被引:42
|
作者
Feng, Huijuan [1 ,2 ,3 ]
Zhang, Xuegong [1 ,2 ]
Zhang, Chaolin [3 ]
机构
[1] Tsinghua Univ, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, TNLIST, Bioinformat Div, Dept Automat, Beijing 100084, Peoples R China
[3] Columbia Univ, Dept Syst Biol, Dept Biochem & Mol Biophys, Ctr Motor Neuron Biol & Dis, New York, NY 10032 USA
来源
NATURE COMMUNICATIONS | 2015年 / 6卷
基金
美国国家卫生研究院;
关键词
QUALITY-CONTROL; HUMAN BRAIN; SEQ DATA; ISOFORM EXPRESSION; DEGRADATION; DECAY; TRANSCRIPTOME; QUANTIFICATION; SITES; MOUSE;
D O I
10.1038/ncomms8816
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The volume of RNA-Seq data sets in public repositories has been expanding exponentially, providing unprecedented opportunities to study gene expression regulation. Because degraded RNA samples, such as those collected from post-mortem tissues, can result in distinct expression profiles with potential biases, a particularly important step in mining these data is quality control. Here we develop a method named mRIN to directly assess mRNA integrity from RNA-Seq data at the sample and individual gene level. We systematically analyse large-scale RNA-Seq data sets of the human brain transcriptome generated by different consortia. Our analysis demonstrates that 3' bias resulting from partial RNA fragmentation in post-mortem tissues has a marked impact on global expression profiles, and that mRIN effectively identifies samples with different levels of mRNA degradation. Unexpectedly, this process has a reproducible and gene-specific component, and transcripts with different stabilities are associated with distinct functions and structural features reminiscent of mRNA decay in living cells.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Statistical Learning of Large-Scale Genetic Data: How to Run a Genome-Wide Association Study of Gene-Expression Data Using the 1000 Genomes Project Data
    Sugolov, Anton
    Emmenegger, Eric
    Paterson, Andrew D.
    Sun, Lei
    STATISTICS IN BIOSCIENCES, 2024, 16 (01) : 250 - 264
  • [42] Statistical Learning of Large-Scale Genetic Data: How to Run a Genome-Wide Association Study of Gene-Expression Data Using the 1000 Genomes Project Data
    Anton Sugolov
    Eric Emmenegger
    Andrew D. Paterson
    Lei Sun
    Statistics in Biosciences, 2024, 16 : 250 - 264
  • [43] Sequencing mRNA from Cryo-Sliced Drosophila Embryos to Determine Genome-Wide Spatial Patterns of Gene Expression
    Combs, Peter A.
    Eisen, Michael B.
    PLOS ONE, 2013, 8 (08):
  • [44] Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models
    Xiao, Jian
    Zhu, Wensheng
    Guo, Jianhua
    BMC BIOINFORMATICS, 2013, 14
  • [45] Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models
    Jian Xiao
    Wensheng Zhu
    Jianhua Guo
    BMC Bioinformatics, 14
  • [46] Large-Scale Pathway-Based Analysis of Bladder Cancer Genome-Wide Association Data from Five Studies of European Background
    Menashe, Idan
    Figueroa, Jonine D.
    Garcia-Closas, Montserrat
    Chatterjee, Nilanjan
    Malats, Nuria
    Picornell, Antoni
    Maeder, Dennis
    Yang, Qi
    Prokunina-Olsson, Ludmila
    Wang, Zhaoming
    Real, Francisco X.
    Jacobs, Kevin B.
    Baris, Dalsu
    Thun, Michael
    Albanes, Demetrius
    Purdue, Mark P.
    Kogevinas, Manolis
    Hutchinson, Amy
    Fu, Yi-Ping
    Tang, Wei
    Burdette, Laurie
    Tardon, Adonina
    Serra, Consol
    Carrato, Alfredo
    Garcia-Closas, Reina
    Lloreta, Josep
    Johnson, Alison
    Schwenn, Molly
    Schned, Alan
    Andriole, Gerald, Jr.
    Black, Amanda
    Jacobs, Eric J.
    Diver, Ryan W.
    Gapstur, Susan M.
    Weinstein, Stephanie J.
    Virtamo, Jarmo
    Caporaso, Neil E.
    Landi, Maria Teresa
    Fraumeni, Joseph F., Jr.
    Chanock, Stephen J.
    Silverman, Debra T.
    Rothman, Nathaniel
    PLOS ONE, 2012, 7 (01):
  • [47] FIRM: Flexible integration of single-cell RNA-sequencing data for large-scale multi-tissue cell atlas datasets
    Ming, Jingsi
    Lin, Zhixiang
    Zhao, Jia
    Wan, Xiang
    Yang, Can
    Wu, Angela Ruohao
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [48] A high-efficiency differential expression method for cancer heterogeneity using large-scale single-cell RNA-sequencing data
    Yuan, Xin
    Ma, Shuangge
    Fa, Botao
    Wei, Ting
    Ma, Yanran
    Wang, Yifan
    Lv, Wenwen
    Zhang, Yue
    Zheng, Junke
    Chen, Guoqiang
    Sun, Jing
    Yu, Zhangsheng
    FRONTIERS IN GENETICS, 2022, 13
  • [49] Comparison of Whole Genome Sequencing (WGS) with Conventional Cytogenetics in Profiling Genome-Wide Large-Scale Copy Number and Structural Variations in Pediatric and Adolescent AML
    Wang, L.
    Raimondi, S.
    Newman, S.
    Rusch, M.
    Chen, X.
    Foy, S.
    Silkov, A.
    Neary, J.
    Hedges, D.
    Azzato, E.
    Shurtleff, S.
    Clay, M.
    Gruber, T.
    Rubnitz, J.
    Nichols, K.
    O'Neil, T.
    Nakitandwe, J.
    Furtado, L.
    Trull, A.
    Michael, J.
    Wilkinson, M.
    Knight, J.
    Ellison, D.
    Zhang, J.
    Klco, J.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2020, 22 (11): : S25 - S25
  • [50] Evidence for a large-scale population structure of Arabidopsis thaliana from genome-wide single nucleotide polymorphism markers
    Karl J. Schmid
    Ottó Törjék
    Rhonda Meyer
    Heike Schmuths
    Matthias H. Hoffmann
    Thomas Altmann
    Theoretical and Applied Genetics, 2006, 112 : 1104 - 1114