Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

被引:120
|
作者
Luo, Mi [1 ]
Wang, Yifu [1 ]
Xie, Yunhong [1 ]
Zhou, Lai [1 ]
Qiao, Jingjing [1 ]
Qiu, Siyu [1 ]
Sun, Yujun [1 ]
机构
[1] Beijing Forestry Univ, State Forestry Adm Key Lab Forest Resources & Env, Beijing 100083, Peoples R China
来源
FORESTS | 2021年 / 12卷 / 02期
基金
中国国家自然科学基金;
关键词
feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type; FOREST BIOMASS; IMAGERY; CHINA; MODEL; CLASSIFICATION; SENTINEL-2; TEXTURE; AREA;
D O I
10.3390/f12020216
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Increasing numbers of explanatory variables tend to result in information redundancy and "dimensional disaster" in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 50 条
  • [41] Application of RF-KNN Optimal Technology for the Estimation of Forest Aboveground Biomass Using Multisource Remote Sensing Data
    Guo, Ying
    Li, Zengyuan
    Chen, Er-Xue
    Yu, Xinwen
    He, Qisheng
    2016 INTERNATIONAL CONFERENCE ON MANUFACTURING SCIENCE AND INFORMATION ENGINEERING (ICMSIE 2016), 2016, : 67 - 76
  • [42] Quaternion-Based Texture Analysis of Multiband Satellite Images: Application to the Estimation of Aboveground Biomass in the East Region of Cameroon
    Kenfack, Cedrigue Boris Djiongo
    Monga, Olivier
    Mpong, Serge Moto
    Ndoundam, Rene
    ACTA BIOTHEORETICA, 2018, 66 (01) : 17 - 60
  • [43] Quaternion-Based Texture Analysis of Multiband Satellite Images: Application to the Estimation of Aboveground Biomass in the East Region of Cameroon
    Cedrigue Boris Djiongo Kenfack
    Olivier Monga
    Serge Moto Mpong
    René Ndoundam
    Acta Biotheoretica, 2018, 66 : 17 - 60
  • [44] SVM WITH FEATURE SELECTION AND SMOOTH PREDICTION IN IMAGES: APPLICATION TO CAD OF PROSTATE CANCER.
    Niaf, Emilie
    Flamary, Remi
    Rakotomamonjy, Alain
    Rouviere, Olivier
    Lartizien, Carole
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 2246 - 2250
  • [45] Application of Genetic Algorithm as Feature Selection Technique in Development of Effective Fault Prediction Model
    Kumar, Lov K
    Rath, Santanu Ku.
    2016 IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ELECTRONICS ENGINEERING (UPCON), 2016, : 432 - 437
  • [46] Application of Genetic Algorithm for Feature Selection in Optimisation of SVMR Model for Prediction of Yarn Tenacity
    Abakar, Khalid A. A.
    Yu, Chongwen
    FIBRES & TEXTILES IN EASTERN EUROPE, 2013, 21 (06) : 95 - 99
  • [47] DATA SELECTION AND FEATURE ENGINEERING FOR THE APPLICATION OF MACHINE LEARNING TO THE PREDICTION OF GAS TURBINE TRIP
    Losi, Enzo
    Venturini, Mauro
    Manservigi, Lucrezia
    Ceschini, Giuseppe Fabio
    Bechini, Giovanni
    Cota, Giuseppe
    Riguzzi, Fabrizio
    PROCEEDINGS OF ASME TURBO EXPO 2021: TURBOMACHINERY TECHNICAL CONFERENCE AND EXPOSITION, VOL 8, 2021,
  • [48] Application of genetic algorithm for feature selection in optimisation of SVMR model for prediction of yarn tenacity
    Abakar, Khalid A. A.
    Yu, Chongwen
    Fibres and Textiles in Eastern Europe, 2013, 21 (06): : 95 - 99
  • [49] Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers
    Gomes, Rahul
    Paul, Nijhum
    He, Nichol
    Huber, Aaron Francis
    Jansen, Rick J.
    GENES, 2022, 13 (09)
  • [50] Prediction Method of Type 2 Diabetes Mellitus Based on a Combination of Hybrid Feature Selection and Random Forest
    Wang, Yunming
    Hu, Jiangang
    Fan, Xinru
    Gao, Xiue
    Liu, Changzheng
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 439 - 450