HACSim: an R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves

被引:16
|
作者
Phillips, Jarrett D. [1 ]
French, Steven H. [1 ]
Hanner, Robert H. [2 ]
Gillis, Daniel J. [1 ]
机构
[1] Univ Guelph, Sch Comp Sci, Guelph, ON, Canada
[2] Univ Guelph, Biodivers Inst Ontario, Dept Integrat Biol, Guelph, ON, Canada
关键词
Algorithm; DNA barcoding; Extrapolation; Iterative method; Sampling sufficiency; Species; SPECIES DELIMITATION; DNA; BARCODE; IDENTIFICATION; MODELS; NUMBER; WILL; LIFE;
D O I
10.7717/peerj-cs.243
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assessing levels of standing genetic variation within species requires a robust sampling for the purpose of accurate specimen identification using molecular techniques such as DNA barcoding; however, statistical estimators for what constitutes a robust sample are currently lacking. Moreover, such estimates are needed because most species are currently represented by only one or a few sequences in existing databases, which can safely be assumed to be undersampled. Unfortunately, sample sizes of 5 10 specimens per species typically seen in DNA barcoding studies are often insufficient to adequately capture within-species genetic diversity. Here, we introduce a novel iterative extrapolation simulation algorithm of haplotype accumulation curves, called HACSim (Haplotype Accumulation Curve Simulator) that can be employed to calculate likely sample sizes needed to observe the full range of DNA barcode haplotype variation that exists for a species. Using uniform haplotype and non-uniform haplotype frequency distributions, the notion of sampling sufficiency (the sample size at which sampling accuracy is maximized and above which no new sampling information is likely to be gained) can be gleaned. HACSim can be employed in two primary ways to estimate specimen sample sizes: (1) to simulate haplotype sampling in hypothetical species, and (2) to simulate haplotype sampling in real species mined from public reference sequence databases like the Barcode of Life Data Systems (BOLD) or GenBank for any genomic marker of interest. While our algorithm is globally convergent, runtime is heavily dependent on initial sample sizes and skewness of the corresponding haplotype frequency distribution.
引用
收藏
页码:1 / 37
页数:37
相关论文
共 8 条
  • [1] PAICE: A new R package to estimate the number of inter-island colonizations considering haplotype data and sample size
    Coello, Alberto J.
    Fernandez-Mazuecos, Mario
    Heleno, Ruben H.
    Vargas, Pablo
    [J]. JOURNAL OF BIOGEOGRAPHY, 2022, 49 (04) : 577 - 589
  • [2] Investigating the genetic diversity and differentiation patterns in the Penstemon scariosus species complex under different sample sizes using AFLPs and SSRs
    Rosa A. Rodríguez-Peña
    Robert L. Johnson
    Leigh A. Johnson
    Chris D. Anderson
    Nathan J. Ricks
    Kevin M. Farley
    Matthew D. Robbins
    Andrea D. Wolfe
    Mikel R. Stevens
    [J]. Conservation Genetics, 2018, 19 : 1335 - 1348
  • [3] Investigating the genetic diversity and differentiation patterns in the Penstemon scariosus species complex under different sample sizes using AFLPs and SSRs
    Rodriguez-Pena, Rosa A.
    Johnson, Robert L.
    Johnson, Leigh A.
    Anderson, Chris D.
    Ricks, Nathan J.
    Farley, Kevin M.
    Robbins, Matthew D.
    Wolfe, Andrea D.
    Stevens, Mikel R.
    [J]. CONSERVATION GENETICS, 2018, 19 (06) : 1335 - 1348
  • [4] Assessment of genetic diversity in cotton genotypes using simple sequence repeat (SSR) markers: insights from interspecific and intraspecific variations
    Arslan, Muhammad
    Fatima, Akash
    Javeria, Fatima
    Ijaz, Sehrish
    Riaz, Umair
    Saleem, Gulnaz
    Bekhit, Mounir M.
    Mezher, Milad A.
    Iqbal, Rashid
    [J]. GENETIC RESOURCES AND CROP EVOLUTION, 2024,
  • [5] Comparative assessment of genotyping-by-sequencing and whole-exome sequencing for estimating genetic diversity and geographic structure in small sample sizes: insights from wild jaguar populations
    Lorenzana, Gustavo P.
    Figueiro, Henrique V.
    Coutinho, Luiz L.
    Villela, Priscilla M. S.
    Eizirik, Eduardo
    [J]. GENETICA, 2024,
  • [6] Assessment of genetic diversity among pearl millet [Pennisetum glaucum (L.) R Br.] cultivars using SSR markers
    Singh, Amit Kumar
    Rana, Mukesh Kumar
    Singh, Sonika
    Kumar, Sundeep
    Durgesh, Kumar
    Arya, Lalit
    [J]. RANGE MANAGEMENT AND AGROFORESTRY, 2013, 34 (01) : 77 - 81
  • [7] Assessment of Genetic Diversity in Napier Grass (Pennisetum purpureum Schum.) using Microsatellite, Single-Nucleotide Polymorphism and Insertion-Deletion Markers from Pearl Millet (Pennisetum glaucum [L.] R. Br.)
    Raju Kandel
    Hari P. Singh
    Bharat P. Singh
    Karen R. Harris-Shultz
    William F. Anderson
    [J]. Plant Molecular Biology Reporter, 2016, 34 : 265 - 272
  • [8] Assessment of Genetic Diversity in Napier Grass (Pennisetum purpureum Schum.) using Microsatellite, Single-Nucleotide Polymorphism and Insertion-Deletion Markers from Pearl Millet (Pennisetum glaucum [L.] R. Br.)
    Kandel, Raju
    Singh, Hari P.
    Singh, Bharat P.
    Harris-Shultz, Karen R.
    Anderson, William F.
    [J]. PLANT MOLECULAR BIOLOGY REPORTER, 2016, 34 (01) : 265 - 272