Utilization of synthetic minority oversampling technique for improving potato yield prediction using remote sensing data and machine learning algorithms with small sample size of yield data

被引:10
|
作者
Ebrahimy, Hamid [1 ]
Wang, Yi [2 ]
Zhang, Zhou [1 ]
机构
[1] Univ Wisconsin Madison, Biol Syst Engn, Madison, WI 53706 USA
[2] Univ Wisconsin Madison, Dept Hort, Madison, WI 53706 USA
关键词
Potato; Yield prediction; Machine learning; Synthetic data; SMOTE; NITROGEN STATUS; VEGETATION; SMOTE; SELECTION;
D O I
10.1016/j.isprsjprs.2023.05.015
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
In recent years, the integration of machine learning (ML) algorithms and remote sensing data has been the commonly deployed practice for potato yield prediction in different scales. Since the quantity and quality of training data significantly affect ML algorithms' applicability, their effective use in some cases can be challenging and expensive. In this paper, we utilized the synthetic minority oversampling technique (SMOTE) algorithm to generate synthetic data for potato yield prediction. We conducted several experiments in two study sites called CS1 and CS2. The SMOTE algorithm was employed to produce synthetic data at five multiplication rates (5, 10, 20, 40, and 80). Six ML algorithms including random forest regression (RFR), support vector regression (SVR), Knearest neighbor (KNN), extreme gradient boosting (XGB), deep neural network (DNN), and stacked autoencoder of neural network (SAE) were used for potato yield prediction. To train the ML algorithms, multiple sets of synthetically generated data were combined with the original data. The similarity of synthetic data and original data was evaluated by two metrics (Kullback-Leibler divergence (KLD) and Jensen-Shannon divergence (JSD)), as well as PCA-based visualization. On the other hand, the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) metrics were calculated to evaluate the performance of ML algorithms in potato yield prediction. Both quantitative and visual evaluations showed close similarity between the synthetic and original data. The average JSD (KLD) in CS1 and CS2 were 0.00028 (0.0031) and 0.161 (0.271), respectively. The ML algorithms showed noticeable differences when it comes to utilizing synthetic data. The RFR, XGB, DNN, and SAE algorithms positively responded to the addition of synthetic data, while SVR and KNN were the only ML algorithms that negatively responded to the addition of synthetic data. The DNN algorithm exhibited the highest positive response to the addition of synthetic data with an average RMSE change of -2.35 point percentage in CS1 and -24.54 point percentage in CS2. Although none of the ML algorithms and synthetic sample sizes provided the highest prediction performance in all the settings, which was plausible given the inherent differences among the selected ML algorithms, the RFR algorithm trained with the combination of original and quintupled synthetic data was the most appropriate choice for potato yield prediction.
引用
收藏
页码:12 / 25
页数:14
相关论文
共 50 条
  • [1] Improving Potato Yield Prediction by Combining Cultivar Information and UAV Remote Sensing Data Using Machine Learning
    Li, Dan
    Miao, Yuxin
    Gupta, Sanjay K.
    Rosen, Carl J.
    Yuan, Fei
    Wang, Chongyang
    Wang, Li
    Huang, Yanbo
    REMOTE SENSING, 2021, 13 (16)
  • [2] Yield prediction in a peanut breeding program using remote sensing data and machine learning algorithms
    Pugh, N. Ace
    Young, Andrew
    Ojha, Manisha
    Emendack, Yves
    Sanchez, Jacobo
    Xin, Zhanguo
    Puppala, Naveen
    FRONTIERS IN PLANT SCIENCE, 2024, 15
  • [3] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
    Wah, Yap Bee
    Ismail, Azlan
    Azid, Nur Niswah Naslina
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Zain, Jasni Mohamad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
  • [4] Wheat Yield Prediction Using Machine Learning Method Based on UAV Remote Sensing Data
    Yang, Shurong
    Li, Lei
    Fei, Shuaipeng
    Yang, Mengjiao
    Tao, Zhiqiang
    Meng, Yaxiong
    Xiao, Yonggui
    DRONES, 2024, 8 (07)
  • [5] Improving Wheat Yield Prediction with Multi-Source Remote Sensing Data and Machine Learning in Arid Regions
    Raza, Aamir
    Shahid, Muhammad Adnan
    Zaman, Muhammad
    Miao, Yuxin
    Huang, Yanbo
    Safdar, Muhammad
    Maqbool, Sheraz
    Muhammad, Nalain E.
    REMOTE SENSING, 2025, 17 (05)
  • [6] Potato Yield Prediction Using Machine Learning Techniques and Sentinel 2 Data
    Gomez, Diego
    Salvador, Pablo
    Sanz, Julia
    Luis Casanova, Jose
    REMOTE SENSING, 2019, 11 (15)
  • [7] Improving the prediction accuracy in blended learning environment using synthetic minority oversampling technique
    Dimic, Gabrijela
    Rancic, Dejan
    Macek, Nemanja
    Spalevic, Petar
    Drasute, Vida
    INFORMATION DISCOVERY AND DELIVERY, 2019, 47 (02) : 76 - 83
  • [8] Improving wheat yield prediction integrating proximal sensing and weather data with machine learning
    Ruan, Guojie
    Li, Xinyu
    Yuan, Fei
    Cammarano, Davide
    Ata-UI-Karim, Syed Tahir
    Liu, Xiaojun
    Tian, Yongchao
    Zhu, Yan
    Cao, Weixing
    Cao, Qiang
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 195
  • [9] An Approach for Potato Yield Prediction Using Machine Learning Regression Algorithms
    Patnaik, Prabhu Prasad
    Padhy, Neelamadhab
    NEXT GENERATION OF INTERNET OF THINGS, 2023, 445 : 327 - 336
  • [10] Comparative analysis of machine learning and ensemble approaches for hepatitis B prediction using data mining with synthetic minority oversampling technique
    Alizargar, Azadeh
    Chang, Yang-Lang
    Tan, Tan-Hsu
    Liu, Tsung-Yu
    HEALTH AND TECHNOLOGY, 2024, 14 (01) : 109 - 118