Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

被引:264
|
作者
Schratz, Patrick [1 ]
Muenchow, Jannes [1 ]
Iturritxa, Eugenia [2 ]
Richter, Jakob [3 ]
Brenning, Alexander [1 ]
机构
[1] GISci Grp, Dept Geog, Grietgasse 6, D-07743 Jena, Germany
[2] NEIKER, Apdo 46, Vitoria 01080, Arab, Spain
[3] TU Dortmund Univ, Dept Stat, Dortmund, Germany
关键词
Spatial modeling; Machine-learning; Spatial autocorrelation; Hyperparameter tuning; Spatial cross-validation; MODEL-SELECTION; LANDSLIDE SUSCEPTIBILITY; SPECIES DISTRIBUTION; CROSS-VALIDATION; PREDICTION; AUTOCORRELATION; CLASSIFICATION; OPTIMIZATION; CLASSIFIERS; CLIMATE;
D O I
10.1016/j.ecolmodel.2019.06.002
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
While the application of machine-learning algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages (such as R or Python), there are several practical challenges in the field of ecological modeling related to unbiased performance estimation. One is the influence of spatial autocorrelation in both hyperparameter tuning and performance estimation. Grouped cross-validation strategies have been proposed in recent years in environmental as well as medical contexts to reduce bias in predictive performance. In this study we show the effects of spatial autocorrelation on hyperparameter tuning and performance estimation by comparing several widely used machine-learning algorithms such as boosted regression trees (BRT), k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) with traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM) in terms of predictive performance. Spatial and non-spatial cross-validation methods were used to evaluate model performances aiming to obtain bias-reduced performance estimates. A detailed analysis on the sensitivity of hyperparameter tuning when using different resampling methods (spatial/non-spatial) was performed. As a case study the spatial distribution of forest disease (Diplodia sapinea) in the Basque Country (Spain) was investigated using common environmental variables such as temperature, precipitation, soil and lithology as predictors. Random Forest (mean Brier score estimate of 0.166) outperformed all other methods with regard to predictive accuracy. Though the sensitivity to hyperparameter tuning differed between the ML algorithms, there were in most cases no substantial differences between spatial and non-spatial partitioning for hyperparameter tuning. However, spatial hyperparameter tuning maintains consistency with spatial estimation of classifier performance and should be favored over non-spatial hyperparameter optimization. High performance differences (up to 47%) between the bias-reduced (spatial crossvalidation) and overoptimistic (non-spatial cross-validation) cross-validation settings showed the high need to account for the influence of spatial autocorrelation. Overoptimistic performance estimates may lead to false actions in ecological decision making based on biased model predictions.
引用
收藏
页码:109 / 120
页数:12
相关论文
共 50 条
  • [1] Multiresponse surface methodology for hyperparameter tuning to optimize multiple performance measures of statistical and machine learning algorithms
    Lin, Chang-Yun
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2023, 39 (07) : 2995 - 3013
  • [2] OptABC: an Optimal Hyperparameter Tuning Approach for Machine Learning Algorithms
    Zahedi, Leila
    Mohammadi, Farid Ghareh
    Amini, M. Hadi
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1138 - 1145
  • [3] Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis
    Elgeldawi, Enas
    Sayed, Awny
    Galal, Ahmed R.
    Zaki, Alaa M.
    INFORMATICS-BASEL, 2021, 8 (04):
  • [4] A Statistical Approach to Hyperparameter Tuning of Deep Learning for Construction Machine Classification
    Ottoni, Andre Luiz C.
    Novo, Marcela S.
    Oliveira, Marcos S.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (04) : 5117 - 5128
  • [5] A Statistical Approach to Hyperparameter Tuning of Deep Learning for Construction Machine Classification
    André Luiz C. Ottoni
    Marcela S. Novo
    Marcos S. Oliveira
    Arabian Journal for Science and Engineering, 2024, 49 : 5117 - 5128
  • [6] Mapping of hyperspectral AVIRIS data using machine-learning algorithms
    Waske, Bjorn
    Benediktsson, Jon Atli
    Arnason, Kolbeinn
    Sveinsson, Johannes R.
    CANADIAN JOURNAL OF REMOTE SENSING, 2009, 35 : S106 - S116
  • [7] Spatial mapping Zataria multiflora using different machine-learning algorithms
    Edalat, Mohsen
    Dastres, Emran
    Jahangiri, Enayat
    Moayedi, Gholamreza
    Zamani, Afshin
    Pourghasemi, Hamid Reza
    Tiefenbacher, John P.
    CATENA, 2022, 212
  • [8] Hyperparameter Tuning with High Performance Computing Machine Learning for Imbalanced Alzheimer's Disease Data
    Zhang, Fan
    Petersen, Melissa
    Johnson, Leigh
    Hall, James
    O'Bryant, Sid E.
    APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [9] Machine Learning Assisted Hyperparameter Tuning for Optimization
    Linkous, Lauren
    Lundquist, Jonathan
    Suche, Michael
    Topsakal, Erdem
    2024 IEEE INC-USNC-URSI RADIO SCIENCE MEETING (JOINT WITH AP-S SYMPOSIUM), 2024, : 107 - 108
  • [10] Tuning Machine-Learning Algorithms for Battery-Operated Portable Devices
    Lin, Ziheng
    Gu, Yan
    Chakraborty, Samarjit
    INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 502 - +