Utilization of synthetic minority oversampling technique for improving potato yield prediction using remote sensing data and machine learning algorithms with small sample size of yield data

被引:10
|
作者
Ebrahimy, Hamid [1 ]
Wang, Yi [2 ]
Zhang, Zhou [1 ]
机构
[1] Univ Wisconsin Madison, Biol Syst Engn, Madison, WI 53706 USA
[2] Univ Wisconsin Madison, Dept Hort, Madison, WI 53706 USA
关键词
Potato; Yield prediction; Machine learning; Synthetic data; SMOTE; NITROGEN STATUS; VEGETATION; SMOTE; SELECTION;
D O I
10.1016/j.isprsjprs.2023.05.015
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
In recent years, the integration of machine learning (ML) algorithms and remote sensing data has been the commonly deployed practice for potato yield prediction in different scales. Since the quantity and quality of training data significantly affect ML algorithms' applicability, their effective use in some cases can be challenging and expensive. In this paper, we utilized the synthetic minority oversampling technique (SMOTE) algorithm to generate synthetic data for potato yield prediction. We conducted several experiments in two study sites called CS1 and CS2. The SMOTE algorithm was employed to produce synthetic data at five multiplication rates (5, 10, 20, 40, and 80). Six ML algorithms including random forest regression (RFR), support vector regression (SVR), Knearest neighbor (KNN), extreme gradient boosting (XGB), deep neural network (DNN), and stacked autoencoder of neural network (SAE) were used for potato yield prediction. To train the ML algorithms, multiple sets of synthetically generated data were combined with the original data. The similarity of synthetic data and original data was evaluated by two metrics (Kullback-Leibler divergence (KLD) and Jensen-Shannon divergence (JSD)), as well as PCA-based visualization. On the other hand, the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) metrics were calculated to evaluate the performance of ML algorithms in potato yield prediction. Both quantitative and visual evaluations showed close similarity between the synthetic and original data. The average JSD (KLD) in CS1 and CS2 were 0.00028 (0.0031) and 0.161 (0.271), respectively. The ML algorithms showed noticeable differences when it comes to utilizing synthetic data. The RFR, XGB, DNN, and SAE algorithms positively responded to the addition of synthetic data, while SVR and KNN were the only ML algorithms that negatively responded to the addition of synthetic data. The DNN algorithm exhibited the highest positive response to the addition of synthetic data with an average RMSE change of -2.35 point percentage in CS1 and -24.54 point percentage in CS2. Although none of the ML algorithms and synthetic sample sizes provided the highest prediction performance in all the settings, which was plausible given the inherent differences among the selected ML algorithms, the RFR algorithm trained with the combination of original and quintupled synthetic data was the most appropriate choice for potato yield prediction.
引用
收藏
页码:12 / 25
页数:14
相关论文
共 50 条
  • [31] Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning
    Saeed Khaki
    Hieu Pham
    Lizhi Wang
    Scientific Reports, 11
  • [32] Small area estimation of crop yield using remote sensing satellite data
    Singh, R
    Semwal, DP
    Rai, A
    Chhikara, RS
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2002, 23 (01) : 49 - 56
  • [33] Cropland prediction using remote sensing, ancillary data, and machine learning
    Katal, Nitish
    Hooda, Nishtha
    Sharma, Ashish
    Sharma, Bhisham
    JOURNAL OF APPLIED REMOTE SENSING, 2023, 17 (02)
  • [34] Grassland Yield Estimation Using Transfer Learning from Remote Sensing Data
    Eder, Elias
    Riegler-Nurscher, Peter
    Prankl, Johann
    Prankl, Heinrich
    KUNSTLICHE INTELLIGENZ, 2023, 37 (2-4): : 187 - 194
  • [35] Improving the Prediction Accuracy of Data-Driven Fault Diagnosis for HVAC Systems by Applying the Synthetic Minority Oversampling Technique
    Shakerian, Shahrad
    Jebelli, Houtan
    Sitzabee, William E.
    COMPUTING IN CIVIL ENGINEERING 2021, 2022, : 90 - 97
  • [36] Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis and remote sensing data
    De Clercq, Djavan
    Mahdi, Adam
    AGRICULTURAL SYSTEMS, 2024, 220
  • [37] Assess the impact of Climate Variability on potato yield using remote sensing data in Northern Finland
    Ahrari, Amirhossein
    Ghag, Kedar
    Mustafa, Syed
    Panchanathan, Anandharuban
    Gemitzi, Alexandra
    Oussalah, Mourad
    Klove, Bjorn
    Haghighi, Ali Torabi
    SMART AGRICULTURAL TECHNOLOGY, 2024, 8
  • [38] The impact of spatiotemporal variability of environmental conditions on wheat yield forecasting using remote sensing data and machine learning
    Khechba, Keltoum
    Belgiu, Mariana
    Laamrani, Ahmed
    Stein, Alfred
    Amazirh, Abdelhakim
    Chehbouni, Abdelghani
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2025, 136
  • [39] Cotton yield estimation model based on machine learning using time series UAV remote sensing data
    Xu, Weicheng
    Chen, Pengchao
    Zhan, Yilong
    Chen, Shengde
    Zhang, Lei
    Lan, Yubin
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2021, 104
  • [40] Estimation of Potato Yield Using Satellite Data at a Municipal Level: A Machine Learning Approach
    Salvador, Pablo
    Gomez, Diego
    Sanz, Julia
    Casanova, Jose Luis
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (06)