The Impact of Data Preprocessing on Prediction Effectiveness

被引:0
|
作者
Kiersztyn, Adam [1 ]
Kiersztyn, Krystyna [2 ]
机构
[1] Lublin Univ Technol, Dept Comp Sci, Lublin, Poland
[2] John Paul II Catholic Univ Lublin, Dept Math Modelling, Lublin, Poland
关键词
Preprocessing; Prediction model; Outlier detection; Anomaly detection; Filling gaps;
D O I
10.1007/978-3-031-23492-7_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study considers a very important issue, which is the impact of preprocessing on model performance. On the example of data describing taxicab trips in New York City, a model predicting the average speed of a trip was built. The effectiveness of the obtained model was examined using relative error. The results were compared with the models obtained after prior data cleaning from the records containing missing data. Additionally, the effect of removing outliers on model quality was examined. An integral part of the paper is the description of a new method of anomaly detection. The author's method involves fuzzy classification of the declared distance into three classes. As an indicator to allow for classification, the percentage of redundant distance with respect to Manhattan distance was selected. The results of a wide range of numerical experiments confirm the necessity of preprocessing. Comparison of a number of competing anomaly detection and prediction model building methods allows for reasonable generalization of the obtained conclusions. Additionally, the skillful use of fuzzy sets for anomaly detection allowed the development of a method that can be generalized to other transportation issues.
引用
收藏
页码:353 / 362
页数:10
相关论文
共 50 条
  • [31] A Two-stage Data Preprocessing Approach for Software Fault Prediction
    Chen, Jiaqiang
    Liu, Shulong
    Liu, Wangshu
    Chen, Xiang
    Gu, Qing
    Chen, Daoxu
    2014 EIGHTH INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND RELIABILITY, 2014, : 20 - 29
  • [32] On Selected Data Preprocessing Procedures with the Classification and Prediction Software System (CLAPSS)
    Pancerz, Krzysztof
    Grochowalski, Piotr
    Paja, Wieslaw
    2016 INTERNATIONAL CONFERENCE ON INFORMATION AND DIGITAL TECHNOLOGIES (IDT), 2016, : 219 - 226
  • [33] Impact of Boolean factorization as preprocessing methods for classification of Boolean data
    Belohlavek, Radim
    Outrata, Jan
    Trnecka, Martin
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2014, 72 (1-2) : 3 - 22
  • [34] Iliou Machine Learning Data Preprocessing Method for Stress Level Prediction
    Iliou, Theodoros
    Konstantopoulou, Georgia
    Stephanakis, Ioannis
    Anastasopoulos, Konstantinos
    Lymberopoulos, Dimitrios
    Anastassopoulos, George
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 351 - 361
  • [35] Interest rate prediction: a neuro-hybrid approach with data preprocessing
    Mehdiyev, Nijat
    Enke, David
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2014, 43 (05) : 535 - 550
  • [36] LSTM with Wavelet Transform Based Data Preprocessing for Stock Price Prediction
    Liang, Xiaodan
    Ge, Zhaodi
    Sun, Liling
    He, Maowei
    Chen, Hanning
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [37] Grey three index prediction method based on data assimilation preprocessing
    Tian, Zhouyu
    Zhang, Guangsheng
    Jiang, Jinqi
    Zhou, Mi
    Yang, Xiaoli
    Wang, Zhenhua
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : S4859 - S4867
  • [38] Grey three index prediction method based on data assimilation preprocessing
    Zhouyu Tian
    Guangsheng Zhang
    Jinqi Jiang
    Mi Zhou
    Xiaoli Yang
    Zhenhua Wang
    Cluster Computing, 2019, 22 : 4859 - 4867
  • [39] Prediction of Distillation Column Temperature Using Machine Learning and Data Preprocessing
    Lee, Yechan
    Choi, Yeongryeol
    Cho, Hyungtae
    Kim, Junghwan
    KOREAN CHEMICAL ENGINEERING RESEARCH, 2021, 59 (02): : 191 - 199
  • [40] Multiclass liver disease prediction with adaptive data preprocessing and ensemble modeling
    Al Ahad, Abdullah
    Das, Bibhakar
    Khan, Md Raihan
    Saha, Nitol
    Zahid, Abu
    Ahmad, Mohiuddin
    RESULTS IN ENGINEERING, 2024, 22