Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

被引:193
|
作者
Ahsan, Md Manjurul [1 ]
Mahmud, M. A. Parvez [2 ]
Saha, Pritom Kumar [3 ]
Gupta, Kishor Datta [4 ]
Siddique, Zahed [5 ]
机构
[1] Univ Oklahoma, Sch Ind & Syst Engn, Norman, OK 73019 USA
[2] Deakin Univ, Sch Engn, Waurn Ponds, Vic 3216, Australia
[3] Univ Oklahoma, Mewbourne Coll Earth & Energy, Norman, OK 73019 USA
[4] Univ Memphis, Dept Comp Sci, Memphis, TN 38111 USA
[5] Univ Oklahoma, Sch Aerosp & Mech Engn, Norman, OK 73019 USA
关键词
heart disease; machine learning algorithm; data scaling; prediction; automated model; HEART-DISEASE; FEATURE-SELECTION; PREDICTION; DIAGNOSIS; FEATURES;
D O I
10.3390/technologies9030052
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset-such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms-Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)-and six different data scaling methods-Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model's performance varies depending on the data scaling method.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Performance of Machine Learning Algorithms and Diversity in Data
    Sug, Hyontai
    [J]. 22ND INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, COMMUNICATIONS AND COMPUTERS (CSCC 2018), 2018, 210
  • [2] Triboinformatics: machine learning algorithms and data topology methods for tribology
    Hasan, Md Syam
    Nosonovsky, Michael
    [J]. SURFACE INNOVATIONS, 2022, 10 (4-5) : 229 - 242
  • [3] THE EFFECT OF NOISE AND BIASES ON THE PERFORMANCE OF MACHINE LEARNING ALGORITHMS
    TALMON, JL
    MCNAIR, P
    [J]. INTERNATIONAL JOURNAL OF BIO-MEDICAL COMPUTING, 1992, 31 (01): : 45 - 57
  • [4] Predictive Performance of Machine Learning Algorithms Trained with Sparse Data
    Dewey, H. Heath
    DeVries, Derek R.
    [J]. 2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
  • [5] Quantifying performance of machine learning methods for neuroimaging data
    Jollans, Lee
    Boyle, Rory
    Artiges, Eric
    Banaschewski, Tobias
    Desrivieres, Sylvane
    Grigis, Antoine
    Martinot, Jean-Luc
    Paus, Tomas
    Smolka, Michael N.
    Walter, Henrik
    Schumann, Gunter
    Garavan, Hugh
    Whelan, Robert
    [J]. NEUROIMAGE, 2019, 199 : 351 - 365
  • [6] Comparative Performance of Deep Learning and Machine Learning Algorithms on Imbalanced Handwritten Data
    Amri, A'Inur A'Fifah
    Ismail, Amelia Ritahani
    Zarir, Abdullah Ahmad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (02) : 258 - 264
  • [8] Algorithms of Data Collection and Analysis of Biometric Voice Data with the Use of Machine Learning Methods
    Maksutov, Artem A.
    Bizhanov, Ruslan Zh.
    Kozlov, Valentin K.
    Antonchenko, Artem S.
    [J]. PROCEEDINGS OF THE 2018 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2018, : 1121 - 1125
  • [9] Novel Trends in Scaling Up Machine Learning Algorithms
    Lopes, Noel
    Ribeiro, Bernardete
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 632 - 636
  • [10] Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data
    Deng, Fei
    Huang, Jibing
    Yuan, Xiaoling
    Cheng, Chao
    Zhang, Lanjing
    [J]. LABORATORY INVESTIGATION, 2021, 101 (04) : 430 - 441