Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

被引:264
|
作者
Schratz, Patrick [1 ]
Muenchow, Jannes [1 ]
Iturritxa, Eugenia [2 ]
Richter, Jakob [3 ]
Brenning, Alexander [1 ]
机构
[1] GISci Grp, Dept Geog, Grietgasse 6, D-07743 Jena, Germany
[2] NEIKER, Apdo 46, Vitoria 01080, Arab, Spain
[3] TU Dortmund Univ, Dept Stat, Dortmund, Germany
关键词
Spatial modeling; Machine-learning; Spatial autocorrelation; Hyperparameter tuning; Spatial cross-validation; MODEL-SELECTION; LANDSLIDE SUSCEPTIBILITY; SPECIES DISTRIBUTION; CROSS-VALIDATION; PREDICTION; AUTOCORRELATION; CLASSIFICATION; OPTIMIZATION; CLASSIFIERS; CLIMATE;
D O I
10.1016/j.ecolmodel.2019.06.002
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
While the application of machine-learning algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages (such as R or Python), there are several practical challenges in the field of ecological modeling related to unbiased performance estimation. One is the influence of spatial autocorrelation in both hyperparameter tuning and performance estimation. Grouped cross-validation strategies have been proposed in recent years in environmental as well as medical contexts to reduce bias in predictive performance. In this study we show the effects of spatial autocorrelation on hyperparameter tuning and performance estimation by comparing several widely used machine-learning algorithms such as boosted regression trees (BRT), k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) with traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM) in terms of predictive performance. Spatial and non-spatial cross-validation methods were used to evaluate model performances aiming to obtain bias-reduced performance estimates. A detailed analysis on the sensitivity of hyperparameter tuning when using different resampling methods (spatial/non-spatial) was performed. As a case study the spatial distribution of forest disease (Diplodia sapinea) in the Basque Country (Spain) was investigated using common environmental variables such as temperature, precipitation, soil and lithology as predictors. Random Forest (mean Brier score estimate of 0.166) outperformed all other methods with regard to predictive accuracy. Though the sensitivity to hyperparameter tuning differed between the ML algorithms, there were in most cases no substantial differences between spatial and non-spatial partitioning for hyperparameter tuning. However, spatial hyperparameter tuning maintains consistency with spatial estimation of classifier performance and should be favored over non-spatial hyperparameter optimization. High performance differences (up to 47%) between the bias-reduced (spatial crossvalidation) and overoptimistic (non-spatial cross-validation) cross-validation settings showed the high need to account for the influence of spatial autocorrelation. Overoptimistic performance estimates may lead to false actions in ecological decision making based on biased model predictions.
引用
收藏
页码:109 / 120
页数:12
相关论文
共 50 条
  • [41] Snowplow Truck Performance Assessment and Feature Importance Analysis Using Machine-Learning Techniques
    Yi, Zhiyan
    Liu, Xiaoyue Cathy
    Wei, Ran
    Grubesic, Tony H.
    JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2021, 147 (02)
  • [42] Characterizing EMG data using machine-learning tools
    Yousefi, Jamileh
    Hamilton-Wright, Andrew
    COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 51 : 1 - 13
  • [43] A new data augmentation method to use in machine learning algorithms using statistical measurements
    Avuclu, Emre
    MEASUREMENT, 2021, 180 (180)
  • [44] Medical Data Assessment with Traditional, Machine-learning and Deep-learning Techniques
    Lin, Hong
    Satapathy, Suresh Chandra
    Rajinikanth, V.
    CURRENT MEDICAL IMAGING, 2020, 16 (10) : 1185 - 1186
  • [45] Seafood Quality Detection Using Electronic Nose and Machine Learning Algorithms With Hyperparameter Optimization
    Wijaya, Dedy Rahman
    Syarwan, Nailatul Fadhilah
    Nugraha, Muhammad Agus
    Ananda, Dahliar
    Fahrudin, Tora
    Handayani, Rini
    IEEE ACCESS, 2023, 11 : 62484 - 62495
  • [46] Intelligent System to Detect Malicious URLs Using Machine-Learning Algorithms
    Jeyavadhanam, B. Rebecca
    Bhuvanan, Mahesh
    Sihan, Haroon
    Ahmadzadeh, Sahar
    Karthick, Gayathri
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 2, ICICT 2024, 2024, 1012 : 349 - 358
  • [47] Developing and Improving Risk Models using Machine-learning Based Algorithms
    Wang, Yan
    Ni, Xuelei Sherry
    PROCEEDINGS OF THE 2019 ANNUAL ACM SOUTHEAST CONFERENCE (ACMSE 2019), 2019, : 281 - 282
  • [48] An Efficient Approach to Recognize Hand Gestures Using Machine-Learning Algorithms
    Wahid, Md Ferdous
    Tafreshi, Reza
    Al-Sowaidi, Mubarak
    Langari, Reza
    2018 IEEE 4TH MIDDLE EAST CONFERENCE ON BIOMEDICAL ENGINEERING (MECBME), 2018, : 171 - 176
  • [49] Predicting the Outcome of Construction Change Disputes Using Machine-Learning Algorithms
    Alqaisi, Aaraf Shukur
    Ataei, Hossein
    Seyrfar, Abolfazl
    Al Omari, Mohammad
    JOURNAL OF LEGAL AFFAIRS AND DISPUTE RESOLUTION IN ENGINEERING AND CONSTRUCTION, 2024, 16 (01)
  • [50] Social Determinants of Health in Machine-Learning Algorithms
    Boulos, Nancy M.
    Chang, En
    Burton, Brittany N.
    ANESTHESIA AND ANALGESIA, 2025, 140 (03): : e20 - e21