The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms' Performance

被引:37
|
作者
Alshdaifat, Esra'a [1 ]
Alshdaifat, Doa'a [1 ]
Alsarhan, Ayoub [1 ]
Hussein, Fairouz [1 ]
El-Salhi, Subhieh Moh'd Faraj S. [1 ]
机构
[1] Hashemite Univ, Fac Prince Al Hussein Bin Abdallah II Informat Te, Dept Comp Informat Syst, POB 330127, Zarqa 13133, Jordan
关键词
preprocessing; classification algorithms; normalization; missing values; classification performance; data cleaning;
D O I
10.3390/data6020011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is recognized that the performance of any prediction model is a function of several factors. One of the most significant factors is the adopted preprocessing techniques. In other words, preprocessing is an essential process to generate an effective and efficient classification model. This paper investigates the impact of the most widely used preprocessing techniques, with respect to numerical features, on the performance of classification algorithms. The effect of combining various normalization techniques and handling missing values strategies is assessed on eighteen benchmark datasets using two well-known classification algorithms and adopting different performance evaluation metrics and statistical significance tests. According to the reported experimental results, the impact of the adopted preprocessing techniques varies from one classification algorithm to another. In addition, a statistically significant difference between the considered data preprocessing techniques is demonstrated.
引用
收藏
页码:1 / 23
页数:23
相关论文
共 50 条
  • [31] The effect of numeric features on the scalability of inductive learning programs
    Paliouras, G
    Bree, DS
    MACHINE LEARNING: ECML-95, 1995, 912 : 218 - 231
  • [32] Efficient algorithms for finding optimal binary features in numeric and nominal labeled data
    Mampaey, Michael
    Nijssen, Siegfried
    Feelders, Ad
    Konijn, Rob
    Knobbe, Arno
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 42 (02) : 465 - 492
  • [33] Performance comparison of different classification algorithms applied to the diagnosis of familial hypercholesterolemia in paediatric subjects
    João Albuquerque
    Ana Margarida Medeiros
    Ana Catarina Alves
    Mafalda Bourbon
    Marília Antunes
    Scientific Reports, 12
  • [34] Performance comparison of different classification algorithms applied to the diagnosis of familial hypercholesterolemia in paediatric subjects
    Albuquerque, Joao
    Medeiros, Ana Margarida
    Alves, Ana Catarina
    Bourbon, Mafalda
    Antunes, Marilia
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [35] Efficient algorithms for finding optimal binary features in numeric and nominal labeled data
    Michael Mampaey
    Siegfried Nijssen
    Ad Feelders
    Rob Konijn
    Arno Knobbe
    Knowledge and Information Systems, 2015, 42 : 465 - 492
  • [36] A tutorial on optimization techniques applied to DSM algorithms
    Neves, Darlene Maciel
    da Rocha Klautau Junior, Aldebaro Barreto
    Conte, Marcio Murilo
    de Medeiros, Eduardo Lins
    Reis, Jacklyn Dias
    Dortschy, Boris
    BROADBAND ACCESS COMMUNICATION TECHNOLOGIES II, 2007, 6776
  • [37] Classification algorithms applied to structure formation simulations
    Chacon, J.
    Vazquez, J. A.
    Almaraz, E.
    ASTRONOMY AND COMPUTING, 2022, 38
  • [38] Classification algorithms applied to structure formation simulations
    Chacón, J.
    Vázquez, J.A.
    Almaraz, E.
    Astronomy and Computing, 2022, 38
  • [39] Implementation and Efficient Analysis of Preprocessing Techniques in Deep Learning for Image Classification
    H., James Deva Koresh
    CURRENT MEDICAL IMAGING, 2024, 20
  • [40] Classification of Motor Tasks from EEG Signals Comparing Preprocessing Techniques
    Kauati-Saito, Eric
    da Silveira, Gustavo F. M.
    Da-Silva, Paulo J. G.
    Miranda de Sa, Antonio Mauricio F. L.
    Tierra-Criollo, Carlos Julio
    XXVI BRAZILIAN CONGRESS ON BIOMEDICAL ENGINEERING, CBEB 2018, VOL. 2, 2019, 70 (02): : 109 - 113