SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence

被引:54
|
作者
Lopez-Maestre, Helene [1 ,2 ,3 ,4 ]
Brinza, Lilia [5 ]
Marchet, Camille [6 ,7 ]
Kielbassa, Janice [8 ]
Bastien, Sylvere [1 ,2 ,3 ,4 ]
Boutigny, Mathilde [1 ,2 ,3 ,4 ]
Monnin, David [1 ,2 ,3 ]
El Filali, Adil [1 ,2 ,3 ]
Carareto, Claudia Marcia [9 ]
Vieira, Cristina [1 ,2 ,3 ,4 ]
Picard, Franck [1 ,2 ,3 ]
Kremer, Natacha [1 ,2 ,3 ]
Vavre, Fabrice [1 ,2 ,3 ,4 ]
Sagot, Marie-France [1 ,2 ,3 ,4 ]
Lacroix, Vincent [1 ,2 ,3 ,4 ]
机构
[1] Univ Lyon, F-69000 Lyon, France
[2] Univ Lyon 1, F-69622 Villeurbanne, France
[3] CNRS, UMR5558, Lab Biometrie & Biol Evolut, F-69622 Villeurbanne, France
[4] EPI ERABLE Inria Grenoble, Rhone Alpes, France
[5] BIOASTER, PT Genom & Transcriptom, Lyon, France
[6] Univ Rennes, F-35000 Rennes, France
[7] IRISA, Equipe GenScale, Rennes, France
[8] Univ Lyon 1, Ctr Leon Berard, Synergie Lyon Canc, Lyon, France
[9] UNESP Sao Paulo State Univ, Dept Biol, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会; 欧洲研究理事会;
关键词
ALIGNMENT; TRANSCRIPTOME; POLYMORPHISM; OOGENESIS; EFFICIENT; VARIANTS; UNCOVERS; PROGRAM;
D O I
10.1093/nar/gkw655
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] A comparison of methods for differential expression analysis of RNA-seq data
    Charlotte Soneson
    Mauro Delorenzi
    BMC Bioinformatics, 14
  • [42] NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq
    Zheng Xia
    Jianguo Wen
    Chung-Che Chang
    Xiaobo Zhou
    BMC Bioinformatics, 12
  • [43] Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates
    Sahar Al Seesi
    Yvette Temate Tiagueu
    Alexander Zelikovsky
    Ion I Măndoiu
    BMC Genomics, 15
  • [44] NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq
    Xia, Zheng
    Wen, Jianguo
    Chang, Chung-Che
    Zhou, Xiaobo
    BMC BIOINFORMATICS, 2011, 12
  • [45] Genome-wide Identification and Analysis of Splicing QTLs in Multiple Sclerosis by RNA-Seq Data
    He, Yijie
    Huang, Lin
    Tang, Yaqin
    Yang, Zeyuan
    Han, Zhijie
    FRONTIERS IN GENETICS, 2021, 12
  • [46] Bootstrap-based differential gene expression analysis for RNA-Seq data with and without replicates
    Al Seesi, Sahar
    Tiagueu, Yvette Temate
    Zelikovsky, Alexander
    Mandoiu, Ion I.
    BMC GENOMICS, 2014, 15
  • [47] nagnag: Identification and quantification of NAGNAG alternative splicing using RNA-Seq data
    Yan, Xiaoyan
    Sablok, Gaurav
    Feng, Gang
    Ma, Jiaxin
    Zhao, Hongwei
    Sun, Xiaoyong
    FEBS LETTERS, 2015, 589 (15) : 1766 - 1770
  • [48] Identification of hub glycogenes and their nsSNP analysis from mouse RNA-Seq data
    Firoz, Ahmad
    Malik, Adeel
    Singh, Sanjay Kumar
    Jha, Vivekanand
    Ali, Amjad
    GENE, 2015, 574 (02) : 235 - 246
  • [49] Detecting differential usage of exons from RNA-seq data
    Anders, Simon
    Reyes, Alejandro
    Huber, Wolfgang
    GENOME RESEARCH, 2012, 22 (10) : 2008 - 2017
  • [50] Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data
    McPherson, Andrew
    Wu, Chunxiao
    Hajirasouliha, Iman
    Hormozdiari, Fereydoun
    Hach, Faraz
    Lapuk, Anna
    Volik, Stanislav
    Shah, Sohrab
    Collins, Colin
    Sahinalp, S. Cenk
    BIOINFORMATICS, 2011, 27 (11) : 1481 - 1488