Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

被引:120
|
作者
Luo, Mi [1 ]
Wang, Yifu [1 ]
Xie, Yunhong [1 ]
Zhou, Lai [1 ]
Qiao, Jingjing [1 ]
Qiu, Siyu [1 ]
Sun, Yujun [1 ]
机构
[1] Beijing Forestry Univ, State Forestry Adm Key Lab Forest Resources & Env, Beijing 100083, Peoples R China
来源
FORESTS | 2021年 / 12卷 / 02期
基金
中国国家自然科学基金;
关键词
feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type; FOREST BIOMASS; IMAGERY; CHINA; MODEL; CLASSIFICATION; SENTINEL-2; TEXTURE; AREA;
D O I
10.3390/f12020216
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Increasing numbers of explanatory variables tend to result in information redundancy and "dimensional disaster" in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 50 条
  • [21] Stochastic Density Ratio Estimation and Its Application to Feature Selection
    Braga, Igor
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4345 - 4346
  • [22] Application of Feature Selection Based on Multilayer GA in Stock Prediction
    Li, Xiaoning
    Yu, Qiancheng
    Tang, Chen
    Lu, Zekun
    Yang, Yufan
    SYMMETRY-BASEL, 2022, 14 (07):
  • [23] Feature Selection for Time-Series Prediction in Case of Undetermined Estimation
    Sergii, Khmilovyi
    Yurii, Skobtsov
    Tatyana, Vasyaeva
    Natalia, Andrievskaya
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES (BICA) FOR YOUNG SCIENTISTS, 2016, 449 : 85 - 97
  • [24] Improving the Potential of Coniferous Forest Aboveground Biomass Estimation by Integrating C- and L-Band SAR Data with Feature Selection and Non-Parametric Model
    Hu, Yifan
    Nie, Yonghui
    Liu, Zhihui
    Wu, Guoming
    Fan, Wenyi
    REMOTE SENSING, 2023, 15 (17)
  • [25] Machine learning feature importance selection for predicting aboveground biomass in African savannah with landsat 8 and ALOS PALSAR data
    Ibrahim, Sa 'ad
    Balzter, Heiko
    Tansey, Kevin
    MACHINE LEARNING WITH APPLICATIONS, 2024, 16
  • [26] Approximation-based feature selection and application for algae population estimation
    Qiang Shen
    Richard Jensen
    Applied Intelligence, 2008, 28 : 167 - 181
  • [27] Approximation-based feature selection and application for algae population estimation
    Shen, Qiang
    Jensen, Richard
    APPLIED INTELLIGENCE, 2008, 28 (02) : 167 - 181
  • [28] Cost-Sensitive Feature Selection with Application in Software Defect Prediction
    Miao, Linsong
    Liu, Mingxia
    Zhang, Daoqiang
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 967 - 970
  • [29] Application of Data Mining Algorithms for Feature Selection and Prediction of Diabetic Retinopathy
    Oladele, Tinuke O.
    Ogundokun, Roseline Oluwaseun
    Kayode, Aderonke Anthonia
    Adegun, Adekanmi Adeyinka
    Adebiyi, Marion Oluwabunmi
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT V: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART V, 2019, 11623 : 716 - 730
  • [30] Development and Application of Feature Selection Techniques in Protein Data Analysis and Prediction
    Lin, Hao
    LETTERS IN ORGANIC CHEMISTRY, 2017, 14 (09) : 619 - 620