Performance of statistical and machine learning-based methods for predicting biogeographical patterns of fungal productivity in forest ecosystems

被引:13
|
作者
Morera, Albert [1 ,2 ]
Martinez de Aragon, Juan [3 ]
Bonet, Jose Antonio [1 ,2 ]
Liang, Jingjing [4 ]
De-Miguel, Sergio [1 ,2 ]
机构
[1] Univ Lleida, Dept Crop & Forest Sci, Av Alcalde Rovira Roure 191, E-25198 Lleida, Spain
[2] CTFC AGROTECNIO CERCA Ctr, Joint Res Unit, Av Rovira Roure 191, Lleida 25198, Spain
[3] Forest Sci & Technol Ctr Catalonia, Ctra St Llorenc Morunys Km 2, Solsona 25280, Spain
[4] Purdue Univ, Dept Forestry & Nat Resources, Forest Adv Comp & Artificial Intelligence Lab, W Lafayette, IN 47907 USA
关键词
Modeling; Regression; Biogeography; Climate; Forest; Fungi; Mushrooms;
D O I
10.1186/s40663-021-00297-w
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Background The prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modeling tools. This study compares different statistical and machine learning-based models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modeling approaches to provide accurate and ecologically-consistent predictions. Methods We evaluated and compared the performance of two statistical modeling techniques, namely, generalized linear mixed models and geographically weighted regression, and four techniques based on different machine learning algorithms, namely, random forest, extreme gradient boosting, support vector machine and artificial neural network to predict fungal productivity. Model evaluation was conducted using a systematic methodology combining random, spatial and environmental blocking together with the assessment of the ecological consistency of spatially-explicit model predictions according to scientific knowledge. Results Fungal productivity predictions were sensitive to the modeling approach and the number of predictors used. Moreover, the importance assigned to different predictors varied between machine learning modeling approaches. Decision tree-based models increased prediction accuracy by more than 10% compared to other machine learning approaches, and by more than 20% compared to statistical models, and resulted in higher ecological consistence of the predicted biogeographical patterns of fungal productivity. Conclusions Decision tree-based models were the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modeling data. In this study, we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. This allows for reducing the dimensions of the ecosystem space described by the predictors of the models, resulting in higher similarity between the modeling data and the environmental conditions over the whole study area. When dealing with spatial-temporal data in the analysis of biogeographical patterns, environmental blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Performance of statistical and machine learning-based methods for predicting biogeographical patterns of fungal productivity in forest ecosystems
    Albert Morera
    Juan Martínez de Aragón
    José Antonio Bonet
    Jingjing Liang
    Sergio de-Miguel
    [J]. Forest Ecosystems, 2021, 8 (02) : 278 - 291
  • [2] Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides
    Xu, Jing
    Li, Fuyi
    Leier, Andre
    Xiang, Dongxu
    Shen, Hsin-Hui
    Lago, Tatiana T. Marquez
    Li, Jian
    Yu, Dong-Jun
    Song, Jiangning
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [3] Breaking barriers: a statistical and machine learning-based hybrid system for predicting dementia
    Javeed, Ashir
    Anderberg, Peter
    Ghazi, Ahmad Nauman
    Noor, Adeeb
    Elmstahl, Solve
    Berglund, Johan Sanmartin
    [J]. FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2024, 11
  • [4] Fungal diversities and community assembly processes show different biogeographical patterns in forest and grassland soil ecosystems
    Wang, Min
    Wang, Can
    Yu, Zhijun
    Wang, Hui
    Wu, Changhao
    Masoudi, Abolfazl
    Liu, Jingze
    [J]. FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [5] A Combined Approach for Predicting Employees' Productivity based on Ensemble Machine Learning Methods
    Obiedat, Ruba
    Toubasi, Sara
    [J]. INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (05): : 49 - 58
  • [6] A review of machine learning-based methods for predicting drug-target interactions
    Shi, Wen
    Yang, Hong
    Xie, Linhai
    Yin, Xiao-Xia
    Zhang, Yanchun
    [J]. HEALTH INFORMATION SCIENCE AND SYSTEMS, 2024, 12 (01)
  • [7] Machine learning-based prediction and assessment of recent dynamics of forest net primary productivity in Romania
    Pravalie, Remus
    Niculita, Mihai
    Rosca, Bogdan
    Marin, Gheorghe
    Dumitrascu, Monica
    Patriche, Cristian
    Birsan, Marius -Victor
    Nita, Ion -Andrei
    Tiscovschi, Adrian
    Sirodoev, Igor
    Bandoc, Georgeta
    [J]. JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2023, 334
  • [8] Machine learning and statistical methods for predicting mortality in heart failure
    Mpanya, Dineo
    Celik, Turgay
    Klug, Eric
    Ntsinjana, Hopewell
    [J]. HEART FAILURE REVIEWS, 2021, 26 (03) : 545 - 552
  • [9] Predicting the Duration of Forest Fires Using Machine Learning Methods
    Kopitsa, Constantina
    Tsoulos, Ioannis G.
    Charilogis, Vasileios
    Stavrakoudis, Athanassios
    [J]. Future Internet, 2024, 16 (11):
  • [10] Machine learning and statistical methods for predicting mortality in heart failure
    Dineo Mpanya
    Turgay Celik
    Eric Klug
    Hopewell Ntsinjana
    [J]. Heart Failure Reviews, 2021, 26 : 545 - 552