The Effect of Preprocessing Techniques, Applied to Numeric Features, on Classification Algorithms' Performance

被引:37
|
作者
Alshdaifat, Esra'a [1 ]
Alshdaifat, Doa'a [1 ]
Alsarhan, Ayoub [1 ]
Hussein, Fairouz [1 ]
El-Salhi, Subhieh Moh'd Faraj S. [1 ]
机构
[1] Hashemite Univ, Fac Prince Al Hussein Bin Abdallah II Informat Te, Dept Comp Informat Syst, POB 330127, Zarqa 13133, Jordan
关键词
preprocessing; classification algorithms; normalization; missing values; classification performance; data cleaning;
D O I
10.3390/data6020011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is recognized that the performance of any prediction model is a function of several factors. One of the most significant factors is the adopted preprocessing techniques. In other words, preprocessing is an essential process to generate an effective and efficient classification model. This paper investigates the impact of the most widely used preprocessing techniques, with respect to numerical features, on the performance of classification algorithms. The effect of combining various normalization techniques and handling missing values strategies is assessed on eighteen benchmark datasets using two well-known classification algorithms and adopting different performance evaluation metrics and statistical significance tests. According to the reported experimental results, the impact of the adopted preprocessing techniques varies from one classification algorithm to another. In addition, a statistically significant difference between the considered data preprocessing techniques is demonstrated.
引用
收藏
页码:1 / 23
页数:23
相关论文
共 50 条
  • [1] Hyperspectral data preprocessing to improve performance of classification algorithms
    Subramanian, S
    Gat, N
    Barhen, J
    IMAGING SPECTROMETRY III, 1997, 3118 : 232 - 240
  • [2] Arabic Document Classification: Performance Investigation of Preprocessing and Representation Techniques
    Muaad, Abdullah Y.
    Davanagere, Hanumanthappa Jayappa
    Guru, D. S.
    Benifa, J. V. Bibal
    Chola, Channabasava
    AlSalman, Hussain
    Gumaei, Abdu H.
    Al-antari, Mugahed A.
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [3] The Effect of Image Preprocessing Algorithms on Diabetic Foot Ulcer Classification
    Okafor, Njideka Chiamaka
    Cassidy, Bill
    O'Shea, Claire
    Pappachan, Joseph M.
    Yap, Moi Hoon
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, PT II, MIUA 2024, 2024, 14860 : 336 - 352
  • [4] Effect of Preprocessing on Performance of Neural Networks for Microscopy Image Classification
    Uka, Arban
    Polisi, Xhoena
    Barthes, Julien
    Halili, Albana Ndreu
    Skuka, Florenc
    Vrana, Nihal Engin
    2020 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE, 2020, : 162 - 165
  • [5] Comparison of multivariate preprocessing techniques as applied to electronic tongue based pattern classification for black tea
    Palit, Mousumi
    Tudu, Bipan
    Bhattacharyya, Nabarun
    Dutta, Ankur
    Dutta, Pallab Kumar
    Jana, Arun
    Bandyopadhyay, Rajib
    Chatterjee, Anutosh
    ANALYTICA CHIMICA ACTA, 2010, 675 (01) : 8 - 15
  • [6] Data preprocessing techniques for classification without discrimination
    Faisal Kamiran
    Toon Calders
    Knowledge and Information Systems, 2012, 33 : 1 - 33
  • [7] Data preprocessing techniques for classification without discrimination
    Kamiran, Faisal
    Calders, Toon
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (01) : 1 - 33
  • [8] A survey on preprocessing and classification techniques for acoustic scene
    Singh, Vikash Kumar
    Sharma, Kalpana
    Sur, Samarendra Nath
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [9] Evaluation of preprocessing techniques for chief complaint classification
    Dara, Jagan
    Dowling, John N.
    Travers, Debbie
    Cooper, Gregory F.
    Chapman, Wendy W.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2008, 41 (04) : 613 - 623
  • [10] Performance Analysis of Modulation Classification with a Preprocessing
    Ahn, Seongjin
    Yoon, Dongweon
    Shim, Hongsuk
    Park, Sungkyun
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1519 - 1521