The Impact of Data Preprocessing on Prediction Effectiveness

被引:0
|
作者
Kiersztyn, Adam [1 ]
Kiersztyn, Krystyna [2 ]
机构
[1] Lublin Univ Technol, Dept Comp Sci, Lublin, Poland
[2] John Paul II Catholic Univ Lublin, Dept Math Modelling, Lublin, Poland
关键词
Preprocessing; Prediction model; Outlier detection; Anomaly detection; Filling gaps;
D O I
10.1007/978-3-031-23492-7_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study considers a very important issue, which is the impact of preprocessing on model performance. On the example of data describing taxicab trips in New York City, a model predicting the average speed of a trip was built. The effectiveness of the obtained model was examined using relative error. The results were compared with the models obtained after prior data cleaning from the records containing missing data. Additionally, the effect of removing outliers on model quality was examined. An integral part of the paper is the description of a new method of anomaly detection. The author's method involves fuzzy classification of the declared distance into three classes. As an indicator to allow for classification, the percentage of redundant distance with respect to Manhattan distance was selected. The results of a wide range of numerical experiments confirm the necessity of preprocessing. Comparison of a number of competing anomaly detection and prediction model building methods allows for reasonable generalization of the obtained conclusions. Additionally, the skillful use of fuzzy sets for anomaly detection allowed the development of a method that can be generalized to other transportation issues.
引用
收藏
页码:353 / 362
页数:10
相关论文
共 50 条
  • [1] The Effectiveness of Using AutoML in Electricity Theft Detection: The Impact of Data Preprocessing and Balancing Techniques
    Yousif, Suhad A.
    Samawi, Venus W.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024, PT II, 2024, 14814 : 68 - 82
  • [2] Impact of preprocessing on medical data classification
    Sarab ALMUHAIDEB
    Mohamed El Bachir MENAI
    Frontiers of Computer Science, 2016, 10 (06) : 1082 - 1102
  • [3] Impact of preprocessing on medical data classification
    Sarab Almuhaideb
    Mohamed El Bachir Menai
    Frontiers of Computer Science, 2016, 10 : 1082 - 1102
  • [4] The impact of text preprocessing on the prediction of review ratings
    Isik, Muhittin
    Dag, Hasan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (03) : 1405 - 1421
  • [5] Impact of preprocessing on medical data classification
    Almuhaideb, Sarab
    Menai, Mohamed El Bachir
    FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (06) : 1082 - 1102
  • [6] Preprocessing of unstructured medical data: the impact of each preprocessing stage on classification
    Kashina, M.
    Lenivtceva, I. D.
    Kopanitsa, G. D.
    9TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE, YSC2020, 2020, 178 : 284 - 290
  • [7] The Impact of Distributed Data Preprocessing on Automotive Data Streams
    Tawakuli, Amal
    Engel, Thomas
    2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,
  • [8] A Generalized Data Preprocessing Method For Wind Power Prediction
    An, Jiakun
    Bie, Zhaohong
    Chen, Xiaozhong
    Hua, Bowen
    Liu, Shiyu
    2013 IEEE POWER AND ENERGY SOCIETY GENERAL MEETING (PES), 2013,
  • [9] A Proposed Data Preprocessing Method for an Industrial Prediction Process
    Battas, Ilham
    Oulhiq, Ridouane
    Behja, Hicham
    Deshayes, Laurent
    2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 98 - 103
  • [10] A data preprocessing framework for students' outcome prediction by data mining techniques
    Danubianu, Mirela
    2015 19TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2015, : 836 - 841