Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

被引:120
|
作者
Luo, Mi [1 ]
Wang, Yifu [1 ]
Xie, Yunhong [1 ]
Zhou, Lai [1 ]
Qiao, Jingjing [1 ]
Qiu, Siyu [1 ]
Sun, Yujun [1 ]
机构
[1] Beijing Forestry Univ, State Forestry Adm Key Lab Forest Resources & Env, Beijing 100083, Peoples R China
来源
FORESTS | 2021年 / 12卷 / 02期
基金
中国国家自然科学基金;
关键词
feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type; FOREST BIOMASS; IMAGERY; CHINA; MODEL; CLASSIFICATION; SENTINEL-2; TEXTURE; AREA;
D O I
10.3390/f12020216
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Increasing numbers of explanatory variables tend to result in information redundancy and "dimensional disaster" in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 50 条
  • [31] A Comparison of Feature Selection Techniques for First-day Mortality Prediction in the ICU
    Epifano, Jacob R.
    Silvestri, Alison
    Yu, Alexander
    Ramachandran, Ravi P.
    Tripathi, Aakash
    Rasool, Ghulam
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [32] Spatial distribution of forest aboveground biomass in China: Estimation through combination of spaceborne lidar, optical imagery, and forest inventory data
    Su, Yanjun
    Guo, Qinghua
    Xue, Baolin
    Hu, Tianyu
    Alvarez, Otto
    Tao, Shengli
    Fang, Jingyun
    REMOTE SENSING OF ENVIRONMENT, 2016, 173 : 187 - 199
  • [33] Probability Density Function Estimation Using the EEF With Application to Subset/Feature Selection
    Kay, Steven
    Ding, Quan
    Tang, Bo
    He, Haibo
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (03) : 641 - 651
  • [34] Multi-Objective Evolutionary Algorithms for Feature Selection: Application in Bankruptcy Prediction
    Gaspar-Cunha, Antonio
    Mendes, Fernando
    Duarte, Joao
    Vieira, Armando
    Ribeiro, Bernardete
    Ribeiro, Andre
    Neves, Joao
    SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 319 - +
  • [35] Application of high-dimensional feature selection: evaluation for genomic prediction in man
    M. L. Bermingham
    R. Pong-Wong
    A. Spiliopoulou
    C. Hayward
    I. Rudan
    H. Campbell
    A. F. Wright
    J. F. Wilson
    F. Agakov
    P. Navarro
    C. S. Haley
    Scientific Reports, 5
  • [36] Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake
    Li, Xue
    Sha, Jian
    Wang, Zhong-Liang
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2018, 25 (20) : 19488 - 19498
  • [37] Application of high-dimensional feature selection: evaluation for genomic prediction in man
    Bermingham, M. L.
    Pong-Wong, R.
    Spiliopoulou, A.
    Hayward, C.
    Rudan, I.
    Campbell, H.
    Wright, A. F.
    Wilson, J. F.
    Agakov, F.
    Navarro, P.
    Haley, C. S.
    SCIENTIFIC REPORTS, 2015, 5
  • [38] Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake
    Xue Li
    Jian Sha
    Zhong-Liang Wang
    Environmental Science and Pollution Research, 2018, 25 : 19488 - 19498
  • [39] Elastic net with Bayesian Density Estimation model for feature selection for photovoltaic energy prediction
    Venkatachalam Mohanasundaram
    Balamurugan Rangaswamy
    Scientific Reports, 15 (1)
  • [40] An application of locally linear model tree algorithm with combination of feature selection in credit scoring
    Siami, Mohammad
    Gholamian, Mohammad Reza
    Basiri, Javad
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2014, 45 (10) : 2213 - 2222