Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data

被引:0
|
作者
Preeti Saini
Bharti Nagpal
机构
[1] NSUT East Campus (Formerly AIACTR),Department of Computer Engineering
[2] USICT,Department of Computer Science and Engineering
[3] Guru Gobind Singh Indraprastha University (GGSIPU),undefined
[4] Guru Nanak Dev DSEU Rohini Campus,undefined
[5] Delhi Skill and Entrepreneurship University,undefined
[6] NSUT East Campus (Formerly AIACTR),undefined
来源
关键词
Missing Data; Imputation; Multiple Regression; MissForest; MICE;
D O I
暂无
中图分类号
学科分类号
摘要
In a realistic scenario, the dataset has missing values encountered during the data collection. To effectively build the prediction model, the missingness of the attributes that impact crop growth needs to be appropriately handled in the crop dataset. The study aims to impute missing data in the Wheat crop yield Dataset, consisting of climatic parameters and historical data of 370 districts of Major Wheat Producer states of India. This study plays a vital role in crop estimation or forecasting of production at regular intervals. The imputation techniques that replace missing data have been categorized into Statistical and Machine Learning based Methods. We explored the performance of popular Techniques such as Arithmetic Average Replacement, Median Imputation, Linear Interpolation, Average Imputation by Nearby Districts, K-Nearest Neighbour, Miss Forest, Regression, and MICE. We have also evaluated these methods on the UCI machine learning repository's Bias and Steel energy consumption datasets. These imputed results were fed to the multiple regression prediction models to evaluate the efficiency of the imputation approaches qualitatively. The results conclude that the Arithmetic Average Replacement method provides good results among the statistical methods (R2 = 0.83; RMSE = 0.47; MAE = 0.372; MSE = 0.229), whereas in Machine Learning based methods, Miss Forest Random Forest-based method, and MICE performed well (R2 = 0.80; MAE = 0.3825; MSE = 0.249; RMSE = 0.499) to impute the missing data. We hope our results help the researchers to select the appropriate pre-processing strategies and improve the data quality.
引用
收藏
页码:40393 / 40414
页数:21
相关论文
共 50 条
  • [1] Analysis of missing data and comparing the accuracy of imputation methods using wheat crop data
    Saini, Preeti
    Nagpal, Bharti
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40393 - 40414
  • [2] Improving Accuracy Rate of Imputation of Missing Data using Classifier Methods
    Thirukumaran, S.
    Sumathi, A.
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [3] Comparison of missing value imputation methods for crop yield data
    Lokupitiya, Ravindra S.
    Lokupitiya, Erandathie
    Paustian, Keith
    [J]. ENVIRONMETRICS, 2006, 17 (04) : 339 - 349
  • [4] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    [J]. PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [5] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    [J]. ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [6] New imputation methods for missing data using quantiles
    Munoz, J. F.
    Rueda, M.
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 232 (02) : 305 - 317
  • [7] Comparing multiple imputation methods for systematically missing subject-level data
    Kline, David
    Andridge, Rebecca
    Kaizar, Eloise
    [J]. RESEARCH SYNTHESIS METHODS, 2017, 8 (02) : 136 - 148
  • [8] Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data
    Taylor, Sandra
    Ponzini, Matthew
    Wilson, Machelle
    Kim, Kyoungmi
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [9] Missing data imputation using fuzzy-rough methods
    Amiri, Mehran
    Jensen, Richard
    [J]. NEUROCOMPUTING, 2016, 205 : 152 - 164
  • [10] Missing data and imputation methods in partition of variables
    da Silva, AL
    Saporta, G
    Bacelar-Nicolau, H
    [J]. CLASSIFICATION, CLUSTERING, AND DATA MINING APPLICATIONS, 2004, : 631 - 637