SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence

被引:54
|
作者
Lopez-Maestre, Helene [1 ,2 ,3 ,4 ]
Brinza, Lilia [5 ]
Marchet, Camille [6 ,7 ]
Kielbassa, Janice [8 ]
Bastien, Sylvere [1 ,2 ,3 ,4 ]
Boutigny, Mathilde [1 ,2 ,3 ,4 ]
Monnin, David [1 ,2 ,3 ]
El Filali, Adil [1 ,2 ,3 ]
Carareto, Claudia Marcia [9 ]
Vieira, Cristina [1 ,2 ,3 ,4 ]
Picard, Franck [1 ,2 ,3 ]
Kremer, Natacha [1 ,2 ,3 ]
Vavre, Fabrice [1 ,2 ,3 ,4 ]
Sagot, Marie-France [1 ,2 ,3 ,4 ]
Lacroix, Vincent [1 ,2 ,3 ,4 ]
机构
[1] Univ Lyon, F-69000 Lyon, France
[2] Univ Lyon 1, F-69622 Villeurbanne, France
[3] CNRS, UMR5558, Lab Biometrie & Biol Evolut, F-69622 Villeurbanne, France
[4] EPI ERABLE Inria Grenoble, Rhone Alpes, France
[5] BIOASTER, PT Genom & Transcriptom, Lyon, France
[6] Univ Rennes, F-35000 Rennes, France
[7] IRISA, Equipe GenScale, Rennes, France
[8] Univ Lyon 1, Ctr Leon Berard, Synergie Lyon Canc, Lyon, France
[9] UNESP Sao Paulo State Univ, Dept Biol, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会; 欧洲研究理事会;
关键词
ALIGNMENT; TRANSCRIPTOME; POLYMORPHISM; OOGENESIS; EFFICIENT; VARIANTS; UNCOVERS; PROGRAM;
D O I
10.1093/nar/gkw655
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
    Li, Bo
    Dewey, Colin N.
    BMC BIOINFORMATICS, 2011, 12
  • [2] RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
    Bo Li
    Colin N Dewey
    BMC Bioinformatics, 12
  • [3] Full-length transcriptome assembly from RNA-Seq data without a reference genome
    Manfred G Grabherr
    Brian J Haas
    Moran Yassour
    Joshua Z Levin
    Dawn A Thompson
    Ido Amit
    Xian Adiconis
    Lin Fan
    Raktima Raychowdhury
    Qiandong Zeng
    Zehua Chen
    Evan Mauceli
    Nir Hacohen
    Andreas Gnirke
    Nicholas Rhind
    Federica di Palma
    Bruce W Birren
    Chad Nusbaum
    Kerstin Lindblad-Toh
    Nir Friedman
    Aviv Regev
    Nature Biotechnology, 2011, 29 : 644 - 652
  • [4] Full-length transcriptome assembly from RNA-Seq data without a reference genome
    Grabherr, Manfred G.
    Haas, Brian J.
    Yassour, Moran
    Levin, Joshua Z.
    Thompson, Dawn A.
    Amit, Ido
    Adiconis, Xian
    Fan, Lin
    Raychowdhury, Raktima
    Zeng, Qiandong
    Chen, Zehua
    Mauceli, Evan
    Hacohen, Nir
    Gnirke, Andreas
    Rhind, Nicholas
    di Palma, Federica
    Birren, Bruce W.
    Nusbaum, Chad
    Lindblad-Toh, Kerstin
    Friedman, Nir
    Regev, Aviv
    NATURE BIOTECHNOLOGY, 2011, 29 (07) : 644 - U130
  • [5] Protein Identification Using Customized Protein Sequence Databases Derived from RNA-Seq Data
    Wang, Xiaojing
    Slebos, Robbert J. C.
    Wang, Dong
    Halvey, Patrick J.
    Tabb, David L.
    Liebler, Daniel C.
    Zhang, Bing
    JOURNAL OF PROTEOME RESEARCH, 2012, 11 (02) : 1009 - 1017
  • [6] Acfs: accurate circRNA identification and quantification from RNA-Seq data
    You, Xintian
    Conrad, Tim O. F.
    SCIENTIFIC REPORTS, 2016, 6
  • [7] Acfs: accurate circRNA identification and quantification from RNA-Seq data
    Xintian You
    Tim OF Conrad
    Scientific Reports, 6
  • [8] Tximeta: Reference sequence checksums for provenance identification in RNA-seq
    Love, Michael I.
    Soneson, Charlotte
    Hickey, Peter F.
    Johnson, Lisa K.
    Pierce, N. Tessa
    Shepherd, Lori
    Morgan, Martin
    Patro, Rob
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (02)
  • [9] Comparative studies of differential gene calling using RNA-Seq data
    Zheng, Ximeng
    Moriyama, Etsuko N.
    BMC BIOINFORMATICS, 2013, 14
  • [10] Comparative studies of differential gene calling using RNA-Seq data
    Ximeng Zheng
    Etsuko N Moriyama
    BMC Bioinformatics, 14