The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms' Performance

被引:37
|
作者
Alshdaifat, Esra'a [1 ]
Alshdaifat, Doa'a [1 ]
Alsarhan, Ayoub [1 ]
Hussein, Fairouz [1 ]
El-Salhi, Subhieh Moh'd Faraj S. [1 ]
机构
[1] Hashemite Univ, Fac Prince Al Hussein Bin Abdallah II Informat Te, Dept Comp Informat Syst, POB 330127, Zarqa 13133, Jordan
关键词
preprocessing; classification algorithms; normalization; missing values; classification performance; data cleaning;
D O I
10.3390/data6020011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is recognized that the performance of any prediction model is a function of several factors. One of the most significant factors is the adopted preprocessing techniques. In other words, preprocessing is an essential process to generate an effective and efficient classification model. This paper investigates the impact of the most widely used preprocessing techniques, with respect to numerical features, on the performance of classification algorithms. The effect of combining various normalization techniques and handling missing values strategies is assessed on eighteen benchmark datasets using two well-known classification algorithms and adopting different performance evaluation metrics and statistical significance tests. According to the reported experimental results, the impact of the adopted preprocessing techniques varies from one classification algorithm to another. In addition, a statistically significant difference between the considered data preprocessing techniques is demonstrated.
引用
收藏
页码:1 / 23
页数:23
相关论文
共 50 条
  • [21] Review of Feed Forward Neural Network classification preprocessing techniques
    Asadi, Roya
    Kareem, Sameem Abdul
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICAL SCIENCES, 2014, 1602 : 567 - 573
  • [22] Preprocessing Techniques Based on LBP and Gabor Filters for Clothing Classification
    Thewsuwan, Srisupang
    Horio, Keiichi
    2016 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2016, : 96 - 101
  • [23] Preprocessing compensation techniques for improved classification of imbalanced medical datasets
    Wosiak, Agnieszka
    Karbowiak, Sylwia
    PROCEEDINGS OF THE 2017 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2017, : 203 - 211
  • [24] Classification of Cervical Cancer Data and The Effect of Random Subspace Algorithms on Classification Performance
    Palabas, Tugba
    Erkaymaz, Okan
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [25] The effect of preprocessing techniques on Twitter Sentiment Analysis
    Krouska, Akrivi
    Troussas, Christos
    Virvou, Maria
    2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA), 2016,
  • [26] Preprocessing procedures and supervised classification applied to a database of systematic soil survey
    Valadares, Alan Pessoa
    Coelho, Ricardo Marques
    de Medeiros Oliveira, Stanley Robson
    SCIENTIA AGRICOLA, 2019, 76 (05): : 439 - 447
  • [27] The effect of rebalancing techniques on the classification performance in cyberbullying datasets
    Marwa Khairy
    Tarek M. Mahmoud
    Tarek Abd-El-Hafeez
    Neural Computing and Applications, 2024, 36 : 1049 - 1065
  • [28] The effect of rebalancing techniques on the classification performance in cyberbullying datasets
    Khairy, Marwa
    Mahmoud, Tarek M.
    Abd-El-Hafeez, Tarek
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (03): : 1049 - 1065
  • [29] Comparison of intelligent classification techniques applied to marble classification
    Sousa, JMC
    Pinto, JRC
    IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 802 - 809
  • [30] Web Page Classification: Features and Algorithms
    Qi, Xiaoguang
    Davison, Brian D.
    ACM COMPUTING SURVEYS, 2009, 41 (02)