Utilization of synthetic minority oversampling technique for improving potato yield prediction using remote sensing data and machine learning algorithms with small sample size of yield data

被引:10
|
作者
Ebrahimy, Hamid [1 ]
Wang, Yi [2 ]
Zhang, Zhou [1 ]
机构
[1] Univ Wisconsin Madison, Biol Syst Engn, Madison, WI 53706 USA
[2] Univ Wisconsin Madison, Dept Hort, Madison, WI 53706 USA
关键词
Potato; Yield prediction; Machine learning; Synthetic data; SMOTE; NITROGEN STATUS; VEGETATION; SMOTE; SELECTION;
D O I
10.1016/j.isprsjprs.2023.05.015
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
In recent years, the integration of machine learning (ML) algorithms and remote sensing data has been the commonly deployed practice for potato yield prediction in different scales. Since the quantity and quality of training data significantly affect ML algorithms' applicability, their effective use in some cases can be challenging and expensive. In this paper, we utilized the synthetic minority oversampling technique (SMOTE) algorithm to generate synthetic data for potato yield prediction. We conducted several experiments in two study sites called CS1 and CS2. The SMOTE algorithm was employed to produce synthetic data at five multiplication rates (5, 10, 20, 40, and 80). Six ML algorithms including random forest regression (RFR), support vector regression (SVR), Knearest neighbor (KNN), extreme gradient boosting (XGB), deep neural network (DNN), and stacked autoencoder of neural network (SAE) were used for potato yield prediction. To train the ML algorithms, multiple sets of synthetically generated data were combined with the original data. The similarity of synthetic data and original data was evaluated by two metrics (Kullback-Leibler divergence (KLD) and Jensen-Shannon divergence (JSD)), as well as PCA-based visualization. On the other hand, the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) metrics were calculated to evaluate the performance of ML algorithms in potato yield prediction. Both quantitative and visual evaluations showed close similarity between the synthetic and original data. The average JSD (KLD) in CS1 and CS2 were 0.00028 (0.0031) and 0.161 (0.271), respectively. The ML algorithms showed noticeable differences when it comes to utilizing synthetic data. The RFR, XGB, DNN, and SAE algorithms positively responded to the addition of synthetic data, while SVR and KNN were the only ML algorithms that negatively responded to the addition of synthetic data. The DNN algorithm exhibited the highest positive response to the addition of synthetic data with an average RMSE change of -2.35 point percentage in CS1 and -24.54 point percentage in CS2. Although none of the ML algorithms and synthetic sample sizes provided the highest prediction performance in all the settings, which was plausible given the inherent differences among the selected ML algorithms, the RFR algorithm trained with the combination of original and quintupled synthetic data was the most appropriate choice for potato yield prediction.
引用
收藏
页码:12 / 25
页数:14
相关论文
共 50 条
  • [21] Integration of Remote Sensing and Meteorological Data for Rapid Sugarcane Yield Estimation Using Machine Learning
    Virani, V. B.
    Kumar, Neeraj
    Mote, B. M.
    JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2025, 53 (04) : 1109 - 1124
  • [22] Alfalfa yield prediction using machine learning and UAV multispectral remote sensing
    Yan H.
    Zhuo Y.
    Li M.
    Wang Y.
    Guo H.
    Wang J.
    Li C.
    Ding F.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (11): : 64 - 71
  • [23] Wheat Crop Field and Yield Prediction using Remote Sensing and Machine Learning
    Ayub, Maheen
    Khan, Najeed Ahmed
    Haider, Rana Zeeshan
    PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (ICAI 2022), 2022, : 158 - 164
  • [24] Predicting Patterns of Firms' Vulnerability to Economic Crises Using Open Data, Synthetic Minority Oversampling Technique and Machine Learning
    Ali, Mohsan
    Loukis, Euripidis
    Charalabidis, Yannis
    PERSPECTIVES IN BUSINESS INFORMATICS RESEARCH, BIR 2023, 2023, 493 : 188 - 196
  • [25] Winter Wheat Yield Prediction Using Satellite Remote Sensing Data and Deep Learning Models
    Fu, Hongkun
    Lu, Jian
    Li, Jian
    Zou, Wenlong
    Tang, Xuhui
    Ning, Xiangyu
    Sun, Yue
    AGRONOMY-BASEL, 2025, 15 (01):
  • [26] Ensemble of Machine Learning Algorithms for Rice Grain Yield Prediction Using UAV-Based Remote Sensing
    Sarkar, Tapash Kumar
    Roy, Dilip Kumar
    Kang, Ye Seong
    Jun, Sae Rom
    Park, Jun Woo
    Ryu, Chan Seok
    JOURNAL OF BIOSYSTEMS ENGINEERING, 2024, 49 (01) : 1 - 19
  • [27] Enhancing crop yield prediction in Senegal using advanced machine learning techniques and synthetic data
    Razavi, Mohammad Amin
    Nejadhashemi, A. Pouyan
    Majidi, Babak
    Razavi, Hoda S.
    Kpodo, Josue
    Eeswaran, Rasu
    Ciampitti, Ignacio
    Prasad, P. V. Vara
    ARTIFICIAL INTELLIGENCE IN AGRICULTURE, 2024, 14 : 99 - 114
  • [28] Rice yield prediction through integration of biophysical parameters with SAR and optical remote sensing data using machine learning models
    Sah, Sonam
    Haldar, Dipanwita
    Singh, R. N.
    Das, B.
    Nain, Ajeet Singh
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [29] Enhancing food security: machine learning-based wheat yield prediction using remote sensing and climate data in PakistanEnhancing food security: machine learning-based wheat yield prediction using remote sensing and climate data in PakistanN. Ahmed et al.
    Nadeem Ahmed
    Syed Amer Mahmood
    Muhammad Haseeb
    Zainab Tahir
    Saira Batool
    Theoretical and Applied Climatology, 2025, 156 (5)
  • [30] Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning
    Khaki, Saeed
    Pham, Hieu
    Wang, Lizhi
    SCIENTIFIC REPORTS, 2021, 11 (01)