SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence

被引:54
|
作者
Lopez-Maestre, Helene [1 ,2 ,3 ,4 ]
Brinza, Lilia [5 ]
Marchet, Camille [6 ,7 ]
Kielbassa, Janice [8 ]
Bastien, Sylvere [1 ,2 ,3 ,4 ]
Boutigny, Mathilde [1 ,2 ,3 ,4 ]
Monnin, David [1 ,2 ,3 ]
El Filali, Adil [1 ,2 ,3 ]
Carareto, Claudia Marcia [9 ]
Vieira, Cristina [1 ,2 ,3 ,4 ]
Picard, Franck [1 ,2 ,3 ]
Kremer, Natacha [1 ,2 ,3 ]
Vavre, Fabrice [1 ,2 ,3 ,4 ]
Sagot, Marie-France [1 ,2 ,3 ,4 ]
Lacroix, Vincent [1 ,2 ,3 ,4 ]
机构
[1] Univ Lyon, F-69000 Lyon, France
[2] Univ Lyon 1, F-69622 Villeurbanne, France
[3] CNRS, UMR5558, Lab Biometrie & Biol Evolut, F-69622 Villeurbanne, France
[4] EPI ERABLE Inria Grenoble, Rhone Alpes, France
[5] BIOASTER, PT Genom & Transcriptom, Lyon, France
[6] Univ Rennes, F-35000 Rennes, France
[7] IRISA, Equipe GenScale, Rennes, France
[8] Univ Lyon 1, Ctr Leon Berard, Synergie Lyon Canc, Lyon, France
[9] UNESP Sao Paulo State Univ, Dept Biol, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会; 欧洲研究理事会;
关键词
ALIGNMENT; TRANSCRIPTOME; POLYMORPHISM; OOGENESIS; EFFICIENT; VARIANTS; UNCOVERS; PROGRAM;
D O I
10.1093/nar/gkw655
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Identification of CNAs from RNA-Seq data
    Iwamoto, Eisuke
    Sanada, Masashi
    Yasuda, Takahiko
    CANCER SCIENCE, 2022, 113 : 1446 - 1446
  • [22] Impact of gene annotation choice on the quantification of RNA-seq data
    David Chisanga
    Yang Liao
    Wei Shi
    BMC Bioinformatics, 23
  • [23] Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis
    Rigaill, Guillem
    Balzergue, Sandrine
    Brunaud, Veronique
    Blondet, Eddy
    Rau, Andrea
    Rogier, Odile
    Caius, Jose
    Maugis-Rabusseau, Cathy
    Soubigou-Taconnat, Ludivine
    Aubourg, Sebastien
    Lurin, Claire
    Martin-Magniette, Marie-Laure
    Delannoy, Etienne
    BRIEFINGS IN BIOINFORMATICS, 2018, 19 (01) : 65 - 76
  • [24] Genome Sequence of a Potential New Benyvirus Isolated from Mango RNA-seq Data
    Sela, Noa
    Luria, Neta
    Yaari, Mor
    Prusky, Dov
    Dombrovsky, Aviv
    GENOME ANNOUNCEMENTS, 2016, 4 (06)
  • [25] Differential expression analysis for paired RNA-seq data
    Chung, Lisa M.
    Ferguson, John P.
    Zheng, Wei
    Qian, Feng
    Bruno, Vincent
    Montgomery, Ruth R.
    Zhao, Hongyu
    BMC BIOINFORMATICS, 2013, 14 : 110
  • [26] Differential expression analysis for paired RNA-seq data
    Lisa M Chung
    John P Ferguson
    Wei Zheng
    Feng Qian
    Vincent Bruno
    Ruth R Montgomery
    Hongyu Zhao
    BMC Bioinformatics, 14
  • [27] Development of Genome-Wide SNP Markers for Barley via Reference-Based RNA-Seq Analysis
    Tanaka, Tsuyoshi
    Ishikawa, Goro
    Ogiso-Tanaka, Eri
    Yanagisawa, Takashi
    Sato, Kazuhiro
    FRONTIERS IN PLANT SCIENCE, 2019, 10
  • [28] SplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data
    Kahles, Andre
    Ong, Cheng Soon
    Zhong, Yi
    Ratsch, Gunnar
    BIOINFORMATICS, 2016, 32 (12) : 1840 - 1847
  • [29] Prediction and Quantification of Splice Events from RNA-Seq Data
    Goldstein, Leonard D.
    Cao, Yi
    Pau, Gregoire
    Lawrence, Michael
    Wu, Thomas D.
    Seshagiri, Somasekar
    Gentleman, Robert
    PLOS ONE, 2016, 11 (05):
  • [30] BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data
    Gu, Jinghua
    Wang, Xiao
    Halakivi-Clarke, Leena
    Clarke, Robert
    Xuan, Jianhua
    BMC BIOINFORMATICS, 2014, 15