Data preprocessing impact on machine learning algorithm performance

被引:3
|
作者
Amato, Alberto [1 ]
Di Lecce, Vincenzo [1 ]
机构
[1] Politecn Bari, Dept Elect & Informat Engn, Bari, Italy
关键词
data analysis; PCA; SPQR; FCM; DIMENSIONALITY REDUCTION; APPROXIMATIONS; EIGENMAPS;
D O I
10.1515/comp-2022-0278
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The popularity of artificial intelligence applications is on the rise, and they are producing better outcomes in numerous fields of research. However, the effectiveness of these applications relies heavily on the quantity and quality of data used. While the volume of data available has increased significantly in recent years, this does not always lead to better results, as the information content of the data is also important. This study aims to evaluate a new data preprocessing technique called semi-pivoted QR (SPQR) approximation for machine learning. This technique is designed for approximating sparse matrices and acts as a feature selection algorithm. To the best of our knowledge, it has not been previously applied to data preprocessing in machine learning algorithms. The study aims to evaluate the impact of SPQR on the performance of an unsupervised clustering algorithm and compare its results to those obtained using principal component analysis (PCA) as the preprocessing algorithm. The evaluation is conducted on various publicly available datasets. The findings suggest that the SPQR algorithm can produce outcomes comparable to those achieved using PCA without altering the original dataset.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Transparent Data Preprocessing for Machine Learning
    Strasser, Sebastian
    Klettke, Meike
    WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
  • [2] Significance and methodology: Preprocessing the big data for machine learning on TBM performance
    Xiao, Hao-Han
    Yang, Wen-Kun
    Hu, Jing
    Zhang, Yun-Pei
    Jing, Liu-Jie
    Chen, Zu-Yu
    UNDERGROUND SPACE, 2022, 7 (04) : 680 - 701
  • [3] Towards Explaining the Effects of Data Preprocessing on Machine Learning
    Zelaya, Carlos Vladimiro Gonzalez
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2086 - 2090
  • [4] Machine Learning based Intelligent Framework for Data Preprocessing
    Sarwar, Sohail
    Qayyum, Zia Ul
    Kaleem, Abdul
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1010 - 1015
  • [5] XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
    Kartashov, Oleg O.
    Chernov, Andrey V.
    Polyanichenko, Dmitry S.
    Butakova, Maria A.
    MATERIALS, 2021, 14 (24)
  • [6] Data Preprocessing and Machine Learning Modeling for Rockburst Assessment
    Li, Jie
    Fu, Helin
    Hu, Kaixun
    Chen, Wei
    SUSTAINABILITY, 2023, 15 (18)
  • [7] Machine Learning Data Markets: Evaluating the Impact of Data Exchange on the Agent Learning Performance
    Baghcheband, Hajar
    Soares, Carlos
    Reis, Luis Paulo
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2023, PT I, 2023, 14115 : 337 - 348
  • [8] SpeedyLoader: Efficient Pipelining of Data Preprocessing and Machine Learning Training
    Nouaji, Rahma
    Bitchebe, Stella
    Balmau, Oana
    PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024, 2024, : 65 - 72
  • [9] The Impact of Data Preprocessing On the Performance of Naive Bayes Classifier
    Chandrasekar, Priyanga
    Qian, Kai
    PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS (COMPSAC), VOL 2, 2016, : 618 - 619
  • [10] Data preprocessing for machine-learning-based adaptive data center transmission
    Keykhosravi, Kamran
    Hamednia, Ahad
    Rastegarfar, Houman
    Agrell, Erik
    ICT EXPRESS, 2022, 8 (01): : 37 - 43