Treating gaps and biases in biodiversity data as a missing data problem

被引:0
|
作者
Bowler, Diana E. [1 ]
Boyd, Robin J. [1 ]
Callaghan, Corey T. [2 ]
Robinson, Robert A. [3 ]
Isaac, Nick J. B. [1 ]
Pocock, Michael J. O. [1 ]
机构
[1] UK Ctr Ecol & Hydrol, Maclean Bldg,Benson Lane, Wallingford OX10 8BB, England
[2] Univ Florida, Dept Wildlife Ecol & Conservat, Ft Lauderdale Res & Educ Ctr, 3205 Coll Ave, Davie, FL 33314 USA
[3] British Trust Ornithol, The Nunnery, Norfolk IP24 2PU, England
基金
英国自然环境研究理事会;
关键词
biodiversity change; citizen science; ecological modelling; macroecology; spatial bias; CITIZEN SCIENCE PROJECTS; SAMPLE SELECTION BIAS; STATISTICAL-INFERENCE; ABUNDANCE INDEX; MODELS; PATTERNS; COMPLETENESS; INFORMATION; POPULATIONS; FRAMEWORK;
D O I
10.1111/brv.13127
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Big biodiversity data sets have great potential for monitoring and research because of their large taxonomic, geographic and temporal scope. Such data sets have become especially important for assessing temporal changes in species' populations and distributions. Gaps in the available data, especially spatial and temporal gaps, often mean that the data are not representative of the target population. This hinders drawing large-scale inferences, such as about species' trends, and may lead to misplaced conservation action. Here, we conceptualise gaps in biodiversity monitoring data as a missing data problem, which provides a unifying framework for the challenges and potential solutions across different types of biodiversity data sets. We characterise the typical types of data gaps as different classes of missing data and then use missing data theory to explore the implications for questions about species' trends and factors affecting occurrences/abundances. By using this framework, we show that bias due to data gaps can arise when the factors affecting sampling and/or data availability overlap with those affecting species. But a data set per se is not biased. The outcome depends on the ecological question and statistical approach, which determine choices around which sources of variation are taken into account. We argue that typical approaches to long-term species trend modelling using monitoring data are especially susceptible to data gaps since such models do not tend to account for the factors driving missingness. To identify general solutions to this problem, we review empirical studies and use simulation studies to compare some of the most frequently employed approaches to deal with data gaps, including subsampling, weighting and imputation. All these methods have the potential to reduce bias but may come at the cost of increased uncertainty of parameter estimates. Weighting techniques are arguably the least used so far in ecology and have the potential to reduce both the bias and variance of parameter estimates. Regardless of the method, the ability to reduce bias critically depends on knowledge of, and the availability of data on, the factors creating data gaps. We use this review to outline the necessary considerations when dealing with data gaps at different stages of the data collection and analysis workflow.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Biases in feature selection with missing data
    Seijo-Pardo, Borja
    Alonso-Betanzos, Amparo
    Bennett, Kristin P.
    Bolon-Canedo, Veronica
    Josse, Julie
    Saeed, Mehreen
    Guyon, Isabelle
    [J]. NEUROCOMPUTING, 2019, 342 : 97 - 112
  • [2] Are All Biases Missing Data Problems?
    Chanelle J. Howe
    Lauren E. Cain
    Joseph W. Hogan
    [J]. Current Epidemiology Reports, 2015, 2 (3) : 162 - 171
  • [3] Plugging project data gaps - Uncovering the missing data
    Ioli, D
    Cazaubon, P
    [J]. CHEMICAL PROCESSING, 1998, 61 (02): : 91 - 91
  • [4] Unlocking biodiversity data: Prioritization and filling the gaps in biodiversity observation data in Europe
    Wetzel, Florian T.
    Bingham, Heather C.
    Groom, Quentin
    Haase, Peter
    Koljalg, Urmas
    Kuhlmann, Michael
    Martin, Corinne S.
    Penev, Lyubomir
    Robertson, Tim
    Saarenmaa, Hannu
    Schmeller, Dirk S.
    Stoll, Stefan
    Tonkin, Jonathan D.
    Haeuser, Christoph L.
    [J]. BIOLOGICAL CONSERVATION, 2018, 221 : 78 - 85
  • [5] Global gaps in soil biodiversity data
    Cameron, Erin K.
    Martins, Ines S.
    Lavelle, Patrick
    Mathieu, Jerome
    Tedersoo, Leho
    Gottschall, Felix
    Guerra, Carlos A.
    Hines, Jes
    Patoine, Guillaume
    Siebert, Julia
    Winter, Marten
    Cesarz, Simone
    Delgado-Baquerizo, Manuel
    Ferlian, Olga
    Fierer, Noah
    Kreft, Holger
    Lovejoy, Thomas E.
    Montanarella, Luca
    Orgiazzi, Alberto
    Pereira, Henrique M.
    Phillips, Helen R. P.
    Settele, Josef
    Wall, Diana H.
    Eisenhauer, Nico
    [J]. NATURE ECOLOGY & EVOLUTION, 2018, 2 (07): : 1042 - 1043
  • [6] Global gaps in soil biodiversity data
    Erin K. Cameron
    Inês S. Martins
    Patrick Lavelle
    Jérôme Mathieu
    Leho Tedersoo
    Felix Gottschall
    Carlos A. Guerra
    Jes Hines
    Guillaume Patoine
    Julia Siebert
    Marten Winter
    Simone Cesarz
    Manuel Delgado-Baquerizo
    Olga Ferlian
    Noah Fierer
    Holger Kreft
    Thomas E. Lovejoy
    Luca Montanarella
    Alberto Orgiazzi
    Henrique M. Pereira
    Helen R. P. Phillips
    Josef Settele
    Diana H. Wall
    Nico Eisenhauer
    [J]. Nature Ecology & Evolution, 2018, 2 : 1042 - 1043
  • [7] State of biodiversity documentation in the Philippines: Metadata gaps, taxonomic biases, and spatial biases in the DNA barcode data of animal and plant taxa in the context of species occurrence data
    Berba, Carmela Maria P.
    Matias, Ambrocio Melvin A.
    [J]. PEERJ, 2022, 10
  • [8] Treating missing data in a clinical neuropsychological dataset -: Data imputation
    Närhi, V
    Laaksonen, S
    Hietala, R
    Ahonen, T
    Lyyti, H
    [J]. CLINICAL NEUROPSYCHOLOGIST, 2001, 15 (03): : 380 - 392
  • [9] Missing data, part 1. Why missing data are a problem
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 161 (06) : 888 - 889
  • [10] Data Imputation for Multivariate Time Series Sensor Data With Large Gaps of Missing Data
    Wu, Rui
    Hamshaw, Scott D.
    Yang, Lei
    Kincaid, Dustin W.
    Etheridge, Randall
    Ghasemkhani, Amir
    [J]. IEEE SENSORS JOURNAL, 2022, 22 (11) : 10671 - 10683