The Impact of Data Preprocessing on Prediction Effectiveness

被引:0
|
作者
Kiersztyn, Adam [1 ]
Kiersztyn, Krystyna [2 ]
机构
[1] Lublin Univ Technol, Dept Comp Sci, Lublin, Poland
[2] John Paul II Catholic Univ Lublin, Dept Math Modelling, Lublin, Poland
来源
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT I | 2023年 / 13588卷
关键词
Preprocessing; Prediction model; Outlier detection; Anomaly detection; Filling gaps;
D O I
10.1007/978-3-031-23492-7_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study considers a very important issue, which is the impact of preprocessing on model performance. On the example of data describing taxicab trips in New York City, a model predicting the average speed of a trip was built. The effectiveness of the obtained model was examined using relative error. The results were compared with the models obtained after prior data cleaning from the records containing missing data. Additionally, the effect of removing outliers on model quality was examined. An integral part of the paper is the description of a new method of anomaly detection. The author's method involves fuzzy classification of the declared distance into three classes. As an indicator to allow for classification, the percentage of redundant distance with respect to Manhattan distance was selected. The results of a wide range of numerical experiments confirm the necessity of preprocessing. Comparison of a number of competing anomaly detection and prediction model building methods allows for reasonable generalization of the obtained conclusions. Additionally, the skillful use of fuzzy sets for anomaly detection allowed the development of a method that can be generalized to other transportation issues.
引用
收藏
页码:353 / 362
页数:10
相关论文
共 50 条
  • [41] Impact of drying-grinding sequence on loblolly pine chips preprocessing effectiveness
    Oyedeji, Oluwafemi
    Fasina, Oladiran
    INDUSTRIAL CROPS AND PRODUCTS, 2017, 96 : 8 - 15
  • [42] Reanalysis and Ground Station data: Advanced data preprocessing in deep learning for wind power prediction
    Ding, Jun-Wei
    Chuang, Ming-Ju
    Tseng, Jing-Siou
    Hsieh, I-Yun Lisa
    APPLIED ENERGY, 2024, 375
  • [43] Review of preprocessing techniques used in soil property prediction from hyperspectral data
    Minu, S.
    Shetty, Amba
    Gopal, Binny
    COGENT GEOSCIENCE, 2016, 2
  • [44] A comparison of data preprocessing strategies for neural network modeling of oil production prediction
    Nguyen, HH
    Chan, CW
    PROCEEDINGS OF THE THIRD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, 2004, : 199 - 207
  • [45] TBI Lesion Segmentation in Head CT: Impact of Preprocessing and Data Augmentation
    Monteiro, Miguel
    Kamnitsas, Konstantinos
    Ferrante, Enzo
    Mathieu, Francois
    McDonagh, Steven
    Cook, Sam
    Stevenson, Susan
    Das, Tilak
    Khetani, Aneesh
    Newman, Tom
    Zeiler, Fred
    Digby, Richard
    Coles, Jonathan P.
    Rueckert, Daniel
    Menon, David K.
    Newcombe, Virginia F. J.
    Glocker, Ben
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT I, 2020, 11992 : 13 - 22
  • [46] Evaluating the Impact of Data Preprocessing Techniques on the Performance of Intrusion Detection Systems
    Kelson Carvalho Santos
    Rodrigo Sanches Miani
    Flávio de Oliveira Silva
    Journal of Network and Systems Management, 2024, 32
  • [47] The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction
    Gao, Kehan
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2014, 24 (09) : 1229 - 1253
  • [48] Software Defect Prediction using Propositionalization based Data Preprocessing: An Empirical Study
    Pak, CholMyong
    Wang, Tian Tian
    Su, Xiao Hong
    2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2018), 2018, : 71 - 77
  • [49] A novel coupling preprocessing approach for handling missing data in water quality prediction
    Chen, Chuqiang
    Xue, Xinhua
    JOURNAL OF HYDROLOGY, 2023, 617
  • [50] The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing
    Crone, Sven F.
    Lessmann, Stefan
    Stahlbock, Robert
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (03) : 781 - 800