Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

被引:120
|
作者
Luo, Mi [1 ]
Wang, Yifu [1 ]
Xie, Yunhong [1 ]
Zhou, Lai [1 ]
Qiao, Jingjing [1 ]
Qiu, Siyu [1 ]
Sun, Yujun [1 ]
机构
[1] Beijing Forestry Univ, State Forestry Adm Key Lab Forest Resources & Env, Beijing 100083, Peoples R China
来源
FORESTS | 2021年 / 12卷 / 02期
基金
中国国家自然科学基金;
关键词
feature selection; machine learning algorithms; ensemble learning; CatBoost; XGBoost; forest type; FOREST BIOMASS; IMAGERY; CHINA; MODEL; CLASSIFICATION; SENTINEL-2; TEXTURE; AREA;
D O I
10.3390/f12020216
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Increasing numbers of explanatory variables tend to result in information redundancy and "dimensional disaster" in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF-RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.
引用
收藏
页码:1 / 22
页数:21
相关论文
共 50 条
  • [1] Combination of hyperspectral and LiDAR for aboveground biomass estimation using machine learning
    Effendi, Nik Ahmad Faris Nik
    Zaki, Nurul Ain Mohd
    Abd Latif, Zulkiflee
    Khanan, Mohd Faisal Abdul
    TRANSACTIONS IN GIS, 2024, 28 (06) : 1750 - 1771
  • [2] Multi-dimensional variables and feature parameter selection for aboveground biomass estimation of potato based on UAV multispectral imagery
    Luo, Shanjun
    Jiang, Xueqin
    He, Yingbin
    Li, Jianping
    Jiao, Weihua
    Zhang, Shengli
    Xu, Fei
    Han, Zhongcai
    Sun, Jing
    Yang, Jinpeng
    Wang, Xiangyi
    Ma, Xintian
    Lin, Zeru
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [3] UAV-based rice aboveground biomass estimation using a random forest model with multi-organ feature selection
    Shi, Jing
    Yang, Kaili
    Yuan, Ningge
    Li, Yuanjin
    Ma, Longfei
    Liu, Yadong
    Fang, Shenghui
    Peng, Yi
    Zhu, Renshan
    Wu, Xianting
    Gong, Yan
    EUROPEAN JOURNAL OF AGRONOMY, 2025, 164
  • [4] REGIONAL PREDICTION MODELS FOR THE ABOVEGROUND BIOMASS ESTIMATION OF Eucalyptus grandis IN NORTHEASTERN ARGENTINA
    Angela Winck, Rosa
    Enrique Fassola, Hugo
    Regina Barth, Sara
    Hector Crechi, Ernesto
    Esteban Keller, Aldo
    Videla, Daniel
    Zaderenko, Constantino
    CIENCIA FLORESTAL, 2015, 25 (03): : 595 - 606
  • [5] Estimation of the rice aboveground biomass based on the first derivative spectrum and Boruta algorithm
    Nian, Ying
    Su, Xiangxiang
    Yue, Hu
    Zhu, Yongji
    Li, Jun
    Wang, Weiqiang
    Sheng, Yali
    Ma, Qiang
    Liu, Jikai
    Li, Xinwei
    FRONTIERS IN PLANT SCIENCE, 2024, 15
  • [6] SOH estimation and RUL prediction of lithium batteries based on multidomain feature fusion and CatBoost model
    Zhang, Mei
    Yin, Jun
    Chen, Wanli
    ENERGY SCIENCE & ENGINEERING, 2023, 11 (09) : 3082 - 3101
  • [7] Application of Estimation of Distribution Algorithm for Feature Selection
    Ayodele, Mayowa
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 43 - 44
  • [8] APPLICATION OF LINEAR FEATURE SELECTION TO ESTIMATION OF PROPORTIONS
    GUSEMAN, LF
    WALTON, JR
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1977, 6 (07): : 611 - 617
  • [9] Hyperspectral Estimation of Chlorophyll Content in Apple Tree Leaf Based on Feature Band Selection and the CatBoost Model
    Zhang, Yu
    Chang, Qingrui
    Chen, Yi
    Liu, Yanfu
    Jiang, Danyao
    Zhang, Zijuan
    AGRONOMY-BASEL, 2023, 13 (08):
  • [10] Estimation of Millet Aboveground Biomass Utilizing Multi-Source UAV Image Feature Fusion
    Yang, Zhongyu
    Yu, Zirui
    Wang, Xiaoyun
    Yan, Wugeng
    Sun, Shijie
    Feng, Meichen
    Sun, Jingjing
    Su, Pengyan
    Sun, Xinkai
    Wang, Zhigang
    Yang, Chenbo
    Wang, Chao
    Zhao, Yu
    Xiao, Lujie
    Song, Xiaoyan
    Zhang, Meijun
    Yang, Wude
    AGRONOMY-BASEL, 2024, 14 (04):