The effect of prevalence and its interaction with sample size on the reliability of species distribution models

被引:97
|
作者
Jimenez-Valverde, A. [1 ,2 ]
Lobo, J. M. [3 ]
Hortal, J. [4 ]
机构
[1] Univ Kansas, Nat Hist Museum, Lawrence, KS 66045 USA
[2] Univ Kansas, Biodivers Res Ctr, Lawrence, KS 66045 USA
[3] CSIC, Museo Nacl Ciencias Nat, Dpto Biodiversidad & Biol Evolutiva, E-28006 Madrid, Spain
[4] Univ London Imperial Coll Sci Technol & Med, Div Biol, Ctr Populat Biol, NERC, Ascot SL5 7PY, Berks, England
基金
英国自然环境研究理事会;
关键词
Logistic regression; Model accuracy; Prevalence; Sample size; Species distribution modelling; MACROTHELE-CALPEIANA ARANEAE; PREDICTIVE PERFORMANCE; POTENTIAL DISTRIBUTION; ENVIRONMENTAL NICHE; LOGISTIC-REGRESSION; PRESENCE-ABSENCE; CONSERVATION; ACCURACY; BIODIVERSITY; SUITABILITY;
D O I
10.1556/ComEc.10.2009.2.9
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Prevalence (the presence/absence ratio in the training data) is commonly thought to influence the reliability of the predictions of species distribution models. However, little is known about its precise impact. We studied its effects using a virtual species, avoiding the presence of unaccounted-for effects in the modeling process (false absences, non-explanatory predictors, etc.). We sampled the distribution of the virtual species to obtain several data subsets of varying sample size and prevalence, and then modeled these data subsets using logistic regressions. Our results show that model predictions can be highly accurate over a wide range of sample sizes and prevalence scores, provided that the predictors are truly related to the distribution of the species and the training data are reliable. The effect of sample size becomes apparent for datasets of less than 70 data points, and the effect of prevalence is significant only for datasets with extremely unbalanced samples (< 0.01 and > 0.99). There is also a strong interaction between sample size and prevalence, indicating that the most negative factor is the sample size of each event (absence and/or presence), and not biased prevalence, as previously thought. We suggest that, in the real world, an interaction must exist between the sample size of each event and the quality of the training data. We discuss that biased prevalences can be a desirable property of the data, instead of a problem to be avoided, also pointing out the importance of using the best absence data possible when modeling the distribution of species of narrow geographic range.
引用
收藏
页码:196 / 205
页数:10
相关论文
共 50 条
  • [1] The effect of prevalence and its interaction with sample size on the reliability of species distribution models
    A. Jiménez-Valverde
    J. M. Lobo
    J. Hortal
    [J]. Community Ecology, 2009, 10 : 196 - 205
  • [2] Effects of sample size on accuracy of species distribution models
    Stockwell, DRB
    Peterson, AT
    [J]. ECOLOGICAL MODELLING, 2002, 148 (01) : 1 - 13
  • [3] Effects of sample size on the performance of species distribution models
    Wisz, M. S.
    Hijmans, R. J.
    Li, J.
    Peterson, A. T.
    Graham, C. H.
    Guisan, A.
    [J]. DIVERSITY AND DISTRIBUTIONS, 2008, 14 (05) : 763 - 773
  • [4] Assessing the effect of sample bias correction in species distribution models
    Dubos, Nicolas
    Preau, Clementine
    Lenormand, Maxime
    Papuga, Guillaume
    Monsarrat, Sophie
    Denelle, Pierre
    Le Louarn, Marine
    Heremans, Stien
    May, Roel
    Roche, Philip
    Luque, Sandra
    [J]. ECOLOGICAL INDICATORS, 2022, 145
  • [5] The effect of sample size and species characteristics on performance of different species distribution modeling methods
    Hernandez, Pilar A.
    Graham, Catherine H.
    Master, Lawrence L.
    Albert, Deborah L.
    [J]. ECOGRAPHY, 2006, 29 (05) : 773 - 785
  • [6] The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models
    Bean, William T.
    Stafford, Robert
    Brashares, Justin S.
    [J]. ECOGRAPHY, 2012, 35 (03) : 250 - 258
  • [7] The effect of sample size on the accuracy of species distribution models: considering both presences and pseudo-absences or background sites
    Liu, Canran
    Newell, Graeme
    White, Matt
    [J]. ECOGRAPHY, 2019, 42 (03) : 535 - 548
  • [8] Effect of sample size and length of observation period on the reliability of apparent pig organ lesion prevalence
    Gertz, M.
    Krieter, J.
    [J]. PREVENTIVE VETERINARY MEDICINE, 2021, 188
  • [9] How sample size can effect landslide size distribution
    Li L.
    Lan H.
    Wu Y.
    [J]. Geoenvironmental Disasters, 3 (1)
  • [10] Optimising occurrence data in species distribution models: sample size, positional uncertainty, and sampling bias matter
    Moudry, Vitezslav
    Bazzichetto, Manuele
    Remelgado, Ruben
    Devillers, Rodolphe
    Lenoir, Jonathan
    Mateo, Ruben G.
    Lembrechts, Jonas J.
    Sillero, Neftali
    Lecours, Vincent
    Cord, Anna F.
    Bartak, Vojtech
    Balej, Petr
    Rocchini, Duccio
    Torresani, Michele
    Arenas-Castro, Salvador
    Man, Matej
    Prajzlerova, Dominika
    Gdulova, Katerina
    Prosek, Jiri
    Marchetto, Elisa
    Zarzo-Arias, Alejandra
    Gabor, Lukas
    Leroy, Francois
    Martini, Matilde
    Malavasi, Marco
    Cazzolla Gatti, Roberto
    Wild, Jan
    Simova, Petra
    [J]. ECOGRAPHY, 2024,