Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

被引：193

作者：

Ahsan, Md Manjurul ^{[1
]}

Mahmud, M. A. Parvez ^{[2
]}

Saha, Pritom Kumar ^{[3
]}

Gupta, Kishor Datta ^{[4
]}

Siddique, Zahed ^{[5
]}

机构：

[1] Univ Oklahoma, Sch Ind & Syst Engn, Norman, OK 73019 USA

[2] Deakin Univ, Sch Engn, Waurn Ponds, Vic 3216, Australia

[3] Univ Oklahoma, Mewbourne Coll Earth & Energy, Norman, OK 73019 USA

[4] Univ Memphis, Dept Comp Sci, Memphis, TN 38111 USA

[5] Univ Oklahoma, Sch Aerosp & Mech Engn, Norman, OK 73019 USA

来源：

TECHNOLOGIES | 2021年 / 9卷 / 03期

关键词：

heart disease; machine learning algorithm; data scaling; prediction; automated model; HEART-DISEASE; FEATURE-SELECTION; PREDICTION; DIAGNOSIS; FEATURES;

D O I：

10.3390/technologies9030052

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset-such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms-Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)-and six different data scaling methods-Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model's performance varies depending on the data scaling method.

引用

页数：17

共 50 条

[1] Performance of Machine Learning Algorithms and Diversity in Data
Sug, Hyontai
[J]. 22ND INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, COMMUNICATIONS AND COMPUTERS (CSCC 2018), 2018, 210
[2] Triboinformatics: machine learning algorithms and data topology methods for tribology
Hasan, Md Syam
Nosonovsky, Michael
[J]. SURFACE INNOVATIONS, 2022, 10 (4-5) : 229 - 242
[3] THE EFFECT OF NOISE AND BIASES ON THE PERFORMANCE OF MACHINE LEARNING ALGORITHMS
TALMON, JL
MCNAIR, P
[J]. INTERNATIONAL JOURNAL OF BIO-MEDICAL COMPUTING, 1992, 31 (01): : 45 - 57
[4] Predictive Performance of Machine Learning Algorithms Trained with Sparse Data
Dewey, H. Heath
DeVries, Derek R.
[J]. 2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
[5] Quantifying performance of machine learning methods for neuroimaging data
Jollans, Lee
Boyle, Rory
Artiges, Eric
Banaschewski, Tobias
Desrivieres, Sylvane
Grigis, Antoine
Martinot, Jean-Luc
Paus, Tomas
Smolka, Michael N.
Walter, Henrik
Schumann, Gunter
Garavan, Hugh
Whelan, Robert
[J]. NEUROIMAGE, 2019, 199 : 351 - 365
[6] Comparative Performance of Deep Learning and Machine Learning Algorithms on Imbalanced Handwritten Data
Amri, A'Inur A'Fifah
Ismail, Amelia Ritahani
Zarir, Abdullah Ahmad
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (02) : 258 - 264
[7] Evaluating the Impact of Scaling Considering the Extrapolation Domain on the Prediction Performance of Machine Learning Algorithms
不详
[J]. JOURNAL OF COMPUTER CHEMISTRY-JAPAN, 2022, 21 (04) : 90 - 93
[8] Algorithms of Data Collection and Analysis of Biometric Voice Data with the Use of Machine Learning Methods
Maksutov, Artem A.
Bizhanov, Ruslan Zh.
Kozlov, Valentin K.
Antonchenko, Artem S.
[J]. PROCEEDINGS OF THE 2018 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2018, : 1121 - 1125
[9] Novel Trends in Scaling Up Machine Learning Algorithms
Lopes, Noel
Ribeiro, Bernardete
[J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 632 - 636
[10] Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data
Deng, Fei
Huang, Jibing
Yuan, Xiaoling
Cheng, Chao
Zhang, Lanjing
[J]. LABORATORY INVESTIGATION, 2021, 101 (04) : 430 - 441

← 1 2 3 4 5 →