The effect of prevalence and its interaction with sample size on the reliability of species distribution models

被引:97
|
作者
Jimenez-Valverde, A. [1 ,2 ]
Lobo, J. M. [3 ]
Hortal, J. [4 ]
机构
[1] Univ Kansas, Nat Hist Museum, Lawrence, KS 66045 USA
[2] Univ Kansas, Biodivers Res Ctr, Lawrence, KS 66045 USA
[3] CSIC, Museo Nacl Ciencias Nat, Dpto Biodiversidad & Biol Evolutiva, E-28006 Madrid, Spain
[4] Univ London Imperial Coll Sci Technol & Med, Div Biol, Ctr Populat Biol, NERC, Ascot SL5 7PY, Berks, England
基金
英国自然环境研究理事会;
关键词
Logistic regression; Model accuracy; Prevalence; Sample size; Species distribution modelling; MACROTHELE-CALPEIANA ARANEAE; PREDICTIVE PERFORMANCE; POTENTIAL DISTRIBUTION; ENVIRONMENTAL NICHE; LOGISTIC-REGRESSION; PRESENCE-ABSENCE; CONSERVATION; ACCURACY; BIODIVERSITY; SUITABILITY;
D O I
10.1556/ComEc.10.2009.2.9
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Prevalence (the presence/absence ratio in the training data) is commonly thought to influence the reliability of the predictions of species distribution models. However, little is known about its precise impact. We studied its effects using a virtual species, avoiding the presence of unaccounted-for effects in the modeling process (false absences, non-explanatory predictors, etc.). We sampled the distribution of the virtual species to obtain several data subsets of varying sample size and prevalence, and then modeled these data subsets using logistic regressions. Our results show that model predictions can be highly accurate over a wide range of sample sizes and prevalence scores, provided that the predictors are truly related to the distribution of the species and the training data are reliable. The effect of sample size becomes apparent for datasets of less than 70 data points, and the effect of prevalence is significant only for datasets with extremely unbalanced samples (< 0.01 and > 0.99). There is also a strong interaction between sample size and prevalence, indicating that the most negative factor is the sample size of each event (absence and/or presence), and not biased prevalence, as previously thought. We suggest that, in the real world, an interaction must exist between the sample size of each event and the quality of the training data. We discuss that biased prevalences can be a desirable property of the data, instead of a problem to be avoided, also pointing out the importance of using the best absence data possible when modeling the distribution of species of narrow geographic range.
引用
收藏
页码:196 / 205
页数:10
相关论文
共 50 条
  • [41] Evaluating the reliability of species distribution models with an indirect measure of bird reproductive performance
    Aizpurua, Olatz
    Cantu-Salazar, Lisette
    San Martin, Gilles
    Sarda-Palomera, Francesc
    Gargallo, Gabriel
    Herrando, Sergi
    Brotons, Lluis
    Titeux, Nicolas
    [J]. JOURNAL OF AVIAN BIOLOGY, 2017, 48 (12) : 1575 - 1582
  • [42] The predictive performances of random forest models with limited sample size and different species traits
    Luan, Jing
    Zhang, Chongliang
    Xu, Binduo
    Xue, Ying
    Ren, Yiping
    [J]. FISHERIES RESEARCH, 2020, 227
  • [43] Effects of sample size and network depth on a deep learning approach to species distribution modeling
    Benkendorf, Donald J.
    Hawkins, Charles P.
    [J]. ECOLOGICAL INFORMATICS, 2020, 60
  • [44] Effect of nanoparticle size and its distribution on the dyeability of polypropylene
    Mani, G
    Fan, QG
    Ugbolue, SC
    Eiff, IM
    [J]. AATCC REVIEW, 2003, 3 (01) : 22 - 26
  • [45] Aggregate restructuring and its effect on the aggregate size distribution
    Gmachowski, L
    [J]. COLLOIDS AND SURFACES A-PHYSICOCHEMICAL AND ENGINEERING ASPECTS, 2002, 207 (1-3) : 271 - 277
  • [46] STABILITY OF A CRYSTALLIZER AND ITS EFFECT ON CRYSTAL SIZE DISTRIBUTION
    WADA, T
    KATOH, N
    [J]. KAGAKU KOGAKU RONBUNSHU, 1994, 20 (03) : 418 - 422
  • [47] Effects of Population Distribution, Sample Size and Correlation Structure on Huberty's Effect Size R
    Hittner, James B.
    [J]. JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2009, 8 (01) : 95 - 99
  • [48] Lignite Particle Size Distribution and Its Effect on Briquetting
    Guan, Jun
    He, Demin
    Li, Yunshan
    Zhang, Qiumin
    [J]. ADVANCES IN CHEMICAL ENGINEERING II, PTS 1-4, 2012, 550-553 : 506 - 510
  • [49] Sample Size Determination for Interval Estimation of the Prevalence of a Sensitive Attribute Under Randomized Response Models
    Qiu, Shi-Fang
    Tang, Man-Lai
    Tao, Ji-Ran
    Wong, Ricky S.
    [J]. PSYCHOMETRIKA, 2022, 87 (04) : 1361 - 1389
  • [50] Sample Size Determination for Interval Estimation of the Prevalence of a Sensitive Attribute Under Randomized Response Models
    Shi-Fang Qiu
    Man-Lai Tang
    Ji-Ran Tao
    Ricky S. Wong
    [J]. Psychometrika, 2022, 87 : 1361 - 1389