Imputation of incomplete large-scale monitoring count data via penalized estimation

被引:3
|
作者
Dakki, Mohamed [1 ]
Robin, Genevieve [2 ]
Suet, Marie [3 ]
Qninba, Abdeljebbar [1 ]
El Agbani, Mohammed A. [1 ]
Ouassou, Asmaa [1 ]
El Hamoumi, Rhimou [4 ]
Azafzaf, Hichem [5 ]
Rebah, Sami [5 ]
Feltrup-Azafzaf, Claudia [5 ]
Hamouda, Naoufel [5 ]
Ibrahim, Wed A. L. [6 ]
Asran, Hosni H. [6 ]
Elhady, Amr A. [6 ]
Ibrahim, Haitham [6 ]
Etayeb, Khaled [7 ,9 ]
Bouras, Essam [8 ,9 ]
Saied, Almokhtar [8 ,9 ]
Glidan, Ashrof [8 ,9 ]
Habib, Bakar M. [10 ]
Sayoud, Mohamed S. [11 ]
Bendjedda, Nadjiba [12 ]
Dami, Laura [3 ]
Deschamps, Clemence [3 ]
Gaget, Elie [3 ,13 ]
Mondain-Monval, Jean-Yves [14 ]
Defos du Rau, Pierre [14 ]
机构
[1] Univ Mohammed V Rabat, Inst Sci, Rabat, Morocco
[2] Univ Evry Val Essonne, CNRS, LaMME, Evry, France
[3] Ctr Rech Tour Valat, Arles, France
[4] Univ Hassan 2, Fac Sci Ben Msik, Casablanca, Morocco
[5] Assoc Amis Oiseaux AAO BirdLife Tunisie, Ariana, Tunisia
[6] Egyptian Environm Affairs Agcy, Cairo, Egypt
[7] Tripoli Univ, Zool Dept, Tripoli, Libya
[8] Environm Gen Author, Tripoli, Libya
[9] Libyan Soc Birds, Tripoli, Libya
[10] Conservat Forets Wilaya Oran, Oran, Algeria
[11] Direct Gen Forets, Ctr Cyneget Reghaia, Algiers, Algeria
[12] Direct Gen Forets, Algiers, Algeria
[13] Int Inst Appl Syst Anal IIASA, Laxenburg, Austria
[14] Off Francais Biodivers, Unite Avifaune Migratrice, Arles, France
来源
METHODS IN ECOLOGY AND EVOLUTION | 2021年 / 12卷 / 06期
关键词
biodiversity monitoring; high‐ dimensional statistics; incomplete count data; missing data imputation; penalized estimation; waterbird trends in North Africa;
D O I
10.1111/2041-210X.13594
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In biodiversity monitoring, large datasets are becoming more and more widely available and are increasingly used globally to estimate species trends and conservation status. These large-scale datasets challenge existing statistical analysis methods, many of which are not adapted to their size, incompleteness and heterogeneity. The development of scalable methods to impute missing data in incomplete large-scale monitoring datasets is crucial to balance sampling in time or space and thus better inform conservation policies. We developed a new method based on penalized Poisson models to impute and analyse incomplete monitoring data in a large-scale framework. The method allows parameterization of (a) space and time factors, (b) the main effects of predictor covariates, as well as (c) space-time interactions. It also benefits from robust statistical and computational capability in large-scale settings. The method was tested extensively on both simulated and real-life waterbird data, with the findings revealing that it outperforms six existing methods in terms of missing data imputation errors. Applying the method to 16 waterbird species, we estimated their long-term trends for the first time at the entire North African scale, a region where monitoring data suffer from many gaps in space and time series. This new approach opens promising perspectives to increase the accuracy of species-abundance trend estimations. We made it freely available in the r package 'lori' () and recommend its use for large-scale count data, particularly in citizen science monitoring programmes.
引用
收藏
页码:1031 / 1039
页数:9
相关论文
共 50 条
  • [21] Score Reporting for Examinees with Incomplete Data on Large-Scale Educational Assessments
    Sinharay, Sandip
    [J]. EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2021, 40 (01) : 79 - 91
  • [22] Assessment of Haplotype Estimation on Two-step Strategies for Large-scale Imputation Projects
    Xiao, Xiangjun
    Hottenga, Jouke Jan
    Groen-Blokhuis, Maria M.
    Ehli, Erik A.
    Abdellaoui, Abdel
    de Geus, Eco
    Hudziak, James J.
    Davies, Gareth E.
    Boomsma, Dorret L.
    Scheet, Paul
    [J]. GENETIC EPIDEMIOLOGY, 2012, 36 (07) : 765 - 765
  • [24] Monitoring the covariance matrix via penalized likelihood estimation
    Li, Bo
    Wang, Kaibo
    Yeh, Arthur B.
    [J]. IIE TRANSACTIONS, 2013, 45 (02) : 132 - 146
  • [25] Cluster-based Best Match Scanning for Large-Scale Missing Data Imputation
    Yu, Weiqing
    Zhu, Wendong
    Liu, Guangyi
    Kan, Bowen
    Zhao, Ting
    Liu, He
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 232 - 238
  • [26] Incomplete Lineage Sorting and Hybridization Statistics for Large-Scale Retroposon Insertion Data
    Kuritzin, Andrej
    Kischka, Tabea
    Schmitz, Juergen
    Churakov, Gennady
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (03)
  • [27] RESEARCH ON THE INCOMPLETE POINT CLOUD DATA REPAIRING OF THE LARGE-SCALE SCENE BUILDINGS
    Li, Yongqiang
    Li, Lixue
    Niu, Lubiao
    Huang, Tengda
    Li, Youpeng
    [J]. 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 6726 - 6729
  • [28] Penalized estimation of panel count data using generalized estimating equation
    Lu, Minggen
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (01): : 1603 - 1642
  • [29] An adaptive approach for online monitoring of large-scale data streams
    Cao, Shuchen
    Zhang, Ruizhi
    [J]. IISE TRANSACTIONS, 2023, 57 (02) : 119 - 130
  • [30] MANAGING DATA FROM LARGE-SCALE CONTINUOUS MONITORING PROJECTS
    MCMORRIS, RL
    GRAVLEY, RJ
    [J]. CHEMICAL ENGINEERING PROGRESS, 1993, 89 (03) : 111 - 115