Ensemble learning prediction model for rapeseed flowering periods incorporating virtual sample generation

被引:0
|
作者
Xie, Qianwei [1 ]
Xue, Fengchang [1 ]
Chen, Jianfei [2 ]
机构
[1] Meteorological Disaster Geographic Information Engineering Laboratory, Nanjing University of Information Science & Technology, Nanjing,210044, China
[2] Guangxi Zhuang Autonomous Region Lightning Protection Center, Nanning,530000, China
关键词
Decision trees;
D O I
10.11975/j.issn.1002-6819.202404106
中图分类号
学科分类号
摘要
Linear regression cannot fully reveal the complex non-linear relationships among influencing factors and scarce samples in the flowering period. In this study, ensemble learning was proposed to predict the flowering periods of rapeseed. The generation of virtual samples was also incorporated. The rapeseed in full bloom and meteorological data was utilized in Longyou County, Quzhou City, Zhejiang Province, China from 1998 to 2023. The original samples were expanded using Gaussian Mixture Model-based Virtual Sample Generation and Cubic Spline Interpolation. Two new datasets were obtained, each of which contained 985 samples. The models were established using eight machine learning methods: Random Forest (RF), Kernel Ridge Regression (KRR), Ridge Regression (RR), Least Absolute Shrinkage and Selection Operator (Lasso), Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Gradient Boosting Decision Tree (GBDT). Hyperparameter optimization was conducted using a Bayesian optimizer. Finally, a prediction model was established for the rapeseed flowering period using stacking ensemble learning. The vast majority of models demonstrated superior performance on the Cubic interpolation dataset, compared with the original and GMM-VSG dataset. Specifically, the RF model was achieved in an RMSE of 0.679 d, an MAE of 0.351 d, and an R2 of 0.990, indicating significant improvements, compared with the original dataset with an RMSE of 6.286 d, an MAE of 5.028 d, and an R2 of 0.201, as well as the GMM-VSG dataset with an RMSE of 2.680 d, an MAE of 1.588 d, and an R2 of 0.881. Additionally, the SVR model also performed better on the Cubic dataset, with an RMSE of 0.849 d, an MAE of 0.333 d, and an R2 of 0.984, indicating a better performance than before. LightGBM as an ensemble learning was performed the best on the Cubic dataset, with the lowest RMSE of 0.613 d MAE of 0.336 d, and the highest R2 of 0.992. The strong feature learning and noise resistance were verified to capture the complex relationships within the dataset. In contrast, there was no significant improvement of Lasso and RR models on the Cubic dataset. For instance, Lasso exhibited an RMSE of 3.879 d and an MAE of 3.054 d on the Cubic dataset. There was a relative decrease in the error, compared with the original RMSE of 6.329 d and MAE of 5.567 d. There was a substantial gap relative to other models. Five models were developed using the Stacking ensemble learning approach: SRX_L, All_L, SLL_L, SRL_L, and SRK_L. Among them, the SRX_L model performed the best across various metrics. The highest R2 value of 0.999 7 was achieved with the lowest RMSE and MAE values among all models, at 0.122 7 d and 0.105 6 d, respectively. There was a general consistency in the actual and predicted flowering trends, in terms of the fitting flowering period. The high predictive accuracy was also obtained over most years, particularly in 2001, 2011, and 2014. Among them, the prediction closely matched the actual data with minimal discrepancies, sometimes less than 0.01 or even approaching zero. However, there were some years with the larger differences, such as 1999 and 2023. Particularly, the year 1999 experienced the largest discrepancy, where the error was 0.442 1 d. The maximum actual flowering period occurred in 2005, reaching 92 days, with an error between the predicted and actual values of 0.041 6 d. The minimum actual flowering period was observed in 2020, at 63 days, with an error between the predicted and actual values of 0.132 5 d. Therefore, the model can be expected to highly accurately predict the extreme values. The virtual sample generation can also be suitable for small datasets. The predictive accuracy and generalizability of the improved model were significantly enhanced to reduce the costs and challenges of data collection. Compared with single machine learning, Stacking ensemble learning can substantially improve the predictive performance. Stacking ensemble learning is well-suited to complex tasks with nonlinear relationships, such as the flowering periods of rapeseed. © 2024 Chinese Society of Agricultural Engineering. All rights reserved.
引用
收藏
页码:159 / 167
相关论文
共 50 条
  • [31] EnsembleSplice: ensemble deep learning model for splice site prediction
    Akpokiro, Victor
    Martin, Trevor
    Oluwadare, Oluwatosin
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [32] An Ensemble Learning Model for Early Dropout Prediction of MOOC Courses
    Kun Ma
    Jiaxuan Zhang
    Yongwei Shao
    Zhenxiang Chen
    Bo Yang
    计算机教育, 2023, (12) : 124 - 139
  • [33] An Ensemble Deep Learning Model for Vehicular Engine Health Prediction
    Joseph Chukwudi, Isinka
    Zaman, Nafees
    Abdur Rahim, Md
    Arafatur Rahman, Md
    Alenazi, Mohammed J. F.
    Pillai, Prashant
    IEEE ACCESS, 2024, 12 : 63433 - 63451
  • [34] Comprehensive hepatotoxicity prediction: ensemble model integrating machine learning and deep learning
    Khan, Muhammad Zafar Irshad
    Ren, Jia-Nan
    Cao, Cheng
    Ye, Hong-Yu-Xiang
    Wang, Hao
    Guo, Ya-Min
    Yang, Jin-Rong
    Chen, Jian-Zhong
    FRONTIERS IN PHARMACOLOGY, 2024, 15
  • [35] An Integrated Stacking Ensemble Model for Natural Gas Purchase Prediction Incorporating Multiple Features
    Wang, Junjie
    Jiang, Lei
    Zhang, Le
    Liu, Yaqi
    Yu, Qihong
    Bu, Yuheng
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [36] CodeGen-Search: A Code Generation Model Incorporating Similar Sample Information
    Li, HongWei
    Kuang, JiangLing
    Zhong, MaoSheng
    Wang, ZhiXiang
    Liu, Gen
    Liu, GanLin
    Xiao, YingJian
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (11N12) : 1899 - 1921
  • [37] A Novel Soft Ensemble Model for Financial Distress Prediction with Different Sample Sizes
    Xu, Wei
    Fu, Hongyong
    Pan, Yuchen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [38] Incremental model selection and ensemble prediction under virtual concept drifting environments
    Yamauchi K.
    Evolving Systems, 2011, 2 (4) : 249 - 260
  • [39] Incremental Model Selection and Ensemble Prediction under Virtual Concept Drifting Environments
    Yamauchi, Koichiro
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 570 - 582
  • [40] An ensemble learning model based on Bayesian model combination for solar energy prediction
    Chang, Jian-Fang
    Dong, Na
    Ip, Wai Hung
    Yung, Kai Leung
    JOURNAL OF RENEWABLE AND SUSTAINABLE ENERGY, 2019, 11 (04)