Transcriptomic SNP discovery for custom genotyping arrays: Impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success

被引:12
|
作者
Humble E. [1 ,2 ]
Thorne M.A.S. [2 ]
Forcada J. [2 ]
Hoffman J.I. [1 ]
机构
[1] Department of Animal Behaviour, University of Bielefeld, Postfach 100131, Bielefeld
[2] British Antarctic Survey, High Cross, Madingley Road, Cambridge
基金
英国自然环境研究理事会;
关键词
Antarctic fur seal; Arctocephalus gazella; Illumina HiSeq sequencing; Marine mammal; Roche; 454; sequencing; Single nucleotide polymorphism; Transcriptome; Validation success;
D O I
10.1186/s13104-016-2209-x
中图分类号
学科分类号
摘要
Background: Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Results: Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Conclusions: Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms. © 2016 The Author(s).
引用
收藏
相关论文
共 24 条
  • [1] CNV discovery using SNP genotyping arrays
    Yau, C.
    Holmes, C. C.
    CYTOGENETIC AND GENOME RESEARCH, 2008, 123 (1-4) : 307 - 312
  • [2] Automated SNP Genotype Clustering Algorithm to Improve Data Completeness in High-Throughput SNP Genotyping Datasets from Custom Arrays
    Edward M.Smith
    Jack Littrell
    Michael Olivier
    Genomics Proteomics & Bioinformatics, 2007, (Z1) : 256 - 259
  • [3] SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis
    Le Hellard, S
    Ballereau, SJ
    Visscher, PM
    Torrance, HS
    Pinson, J
    Morris, SW
    Thomson, ML
    Semple, CAM
    Muir, WJ
    Blackwood, DHR
    Porteous, DJ
    Evans, KL
    NUCLEIC ACIDS RESEARCH, 2002, 30 (15) : e74
  • [4] Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing
    Campbell, Nathan R.
    Harmon, Stephanie A.
    Narum, Shawn R.
    MOLECULAR ECOLOGY RESOURCES, 2015, 15 (04) : 855 - 867
  • [5] SNP discovery in wild and domesticated populations of blue catfish, Ictalurus furcatus, using genotyping-by-sequencing and subsequent SNP validation
    Li, Chao
    Waldbieser, Geoff
    Bosworth, Brian
    Beck, Benjamin H.
    Thongda, Wilawan
    Peatman, Eric
    MOLECULAR ECOLOGY RESOURCES, 2014, 14 (06) : 1261 - 1270
  • [6] Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort
    Armand Valsesia
    Brian J Stevenson
    Dawn Waterworth
    Vincent Mooser
    Peter Vollenweider
    Gérard Waeber
    C Victor Jongeneel
    Jacques S Beckmann
    Zoltán Kutalik
    Sven Bergmann
    BMC Genomics, 13
  • [7] Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort
    Valsesia, Armand
    Stevenson, Brian J.
    Waterworth, Dawn
    Mooser, Vincent
    Vollenweider, Peter
    Waeber, Gerard
    Jongeneel, C. Victor
    Beckmann, Jacques S.
    Kutalik, Zoltan
    Bergmann, Sven
    BMC GENOMICS, 2012, 13
  • [8] Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance
    Kumar P.
    Al-Shafai M.
    Al Muftah W.A.
    Chalhoub N.
    Elsaid M.F.
    Aleem A.A.
    Suhre K.
    BMC Research Notes, 7 (1)
  • [9] Low-cost ddRAD method of SNP discovery and genotyping applied to the periwinkle Littorina saxatilis
    Kess, Tony
    Gross, Jeffrey
    Harper, Fiona
    Boulding, Elizabeth G.
    JOURNAL OF MOLLUSCAN STUDIES, 2016, 82 : 104 - 109
  • [10] A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data
    Danecek, Petr
    McCarthy, Shane A.
    Durbin, Richard
    PLOS ONE, 2016, 11 (05):