Transcriptomic SNP discovery for custom genotyping arrays: Impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success

被引:12
|
作者
Humble E. [1 ,2 ]
Thorne M.A.S. [2 ]
Forcada J. [2 ]
Hoffman J.I. [1 ]
机构
[1] Department of Animal Behaviour, University of Bielefeld, Postfach 100131, Bielefeld
[2] British Antarctic Survey, High Cross, Madingley Road, Cambridge
基金
英国自然环境研究理事会;
关键词
Antarctic fur seal; Arctocephalus gazella; Illumina HiSeq sequencing; Marine mammal; Roche; 454; sequencing; Single nucleotide polymorphism; Transcriptome; Validation success;
D O I
10.1186/s13104-016-2209-x
中图分类号
学科分类号
摘要
Background: Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Results: Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Conclusions: Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms. © 2016 The Author(s).
引用
收藏
相关论文
共 24 条
  • [21] SLAF-seq: An Efficient Method of Large-Scale De Novo SNP Discovery and Genotyping Using High-Throughput Sequencing
    Sun, Xiaowen
    Liu, Dongyuan
    Zhang, Xiaofeng
    Li, Wenbin
    Liu, Hui
    Hong, Weiguo
    Jiang, Chuanbei
    Guan, Ning
    Ma, Chouxian
    Zeng, Huaping
    Xu, Chunhua
    Song, Jun
    Huang, Long
    Wang, Chunmei
    Shi, Junjie
    Wang, Rui
    Zheng, Xianhu
    Lu, Cuiyun
    Wang, Xiaowu
    Zheng, Hongkun
    PLOS ONE, 2013, 8 (03):
  • [22] ALPHLARD-NT: Bayesian Method for Human Leukocyte Antigen Genotyping and Mutation Calling through Simultaneous Analysis of Normal and Tumor Whole-Genome Sequence Data
    Hayashi, Shuto
    Moriyama, Takuya
    Yamaguchi, Rui
    Mizuno, Shinichi
    Komura, Mitsuhiro
    Miyano, Satoru
    Nakagawa, Hidewaki
    Imoto, Seiya
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (09) : 923 - 937
  • [23] Differential frequency of NKG2C/KLRC2 deletion in distinct African populations and susceptibility to Trachoma: a new method for imputation of KLRC2 genotypes from SNP genotyping data
    Goncalves, Adriana
    Makalo, Pateh
    Joof, Hassan
    Burr, Sarah
    Ramadhani, Athumani
    Massae, Patrick
    Malisa, Aiweda
    Mtuy, Tara
    Derrick, Tamsyn
    Last, Anna R.
    Nabicassa, Meno
    Cassama, Eunice
    Houghton, Joanna
    Palmer, Christine D.
    Pickering, Harry
    Burton, Matthew J.
    Mabey, David C. W.
    Bailey, Robin L.
    Goodier, Martin R.
    Holland, Martin J.
    Roberts, Chrissy H.
    HUMAN GENETICS, 2016, 135 (08) : 939 - 951
  • [24] Differential frequency of NKG2C/KLRC2 deletion in distinct African populations and susceptibility to Trachoma: a new method for imputation of KLRC2 genotypes from SNP genotyping data
    Adriana Goncalves
    Pateh Makalo
    Hassan Joof
    Sarah Burr
    Athumani Ramadhani
    Patrick Massae
    Aiweda Malisa
    Tara Mtuy
    Tamsyn Derrick
    Anna R. Last
    Meno Nabicassa
    Eunice Cassama
    Joanna Houghton
    Christine D. Palmer
    Harry Pickering
    Matthew J. Burton
    David C. W. Mabey
    Robin L. Bailey
    Martin R. Goodier
    Martin J. Holland
    Chrissy h. Roberts
    Human Genetics, 2016, 135 : 939 - 951