When to Use Standardization and Normalization: Empirical Evidence From Machine Learning Models and XAI

被引:4
|
作者
Sujon, Khaled Mahmud [1 ]
Hassan, Rohayanti Binti [2 ]
Towshi, Zeba Tusnia [3 ]
Othman, Manal A. [4 ]
Samad, Md Abdus [5 ]
Choi, Kwonhue [5 ]
机构
[1] Univ Teknol Malaysia UTM, Fac Comp, Dept Software Engn, Johor Baharu 81310, Johor, Malaysia
[2] Univ Teknol Malaysia UTM, Fac Comp, Johor Baharu 81310, Johor, Malaysia
[3] Independent Univ, Dept Comp Sci & Engn, Dhaka 1229, Bangladesh
[4] Princess Nourah Bint Abdulrahman Univ, Coll Med, Med Educ Dept, Biomed Informat, Riyadh 11671, Saudi Arabia
[5] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Standardization; normalization; feature scaling; data preprocessing; machine learning; explainable AI (XAI);
D O I
10.1109/ACCESS.2024.3462434
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Optimizing machine learning (ML) model performance relies heavily on appropriate data preprocessing techniques. Despite the widespread use of standardization and normalization, empirical comparisons across different models, dataset sizes, and domains remain sparse. This study bridges this gap by evaluating five machine learning algorithms- Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost)-on datasets of varying sizes from the business, health, and agriculture domains. This study assessed the models without scaling, with standardized data, and with normalized data. The comparative analysis reveals that while standardization consistently improves the performance of linear models like SVM and LR for large and medium datasets, normalization enhances the performance of linear models for small datasets. Moreover, this study employs SHapley Additive exPlanations (SHAP) summary plots to understand how each feature contributes to the model's performance interpretability with unscaled and scaled datasets. This study provides practical guidelines for selecting appropriate scaling techniques based on the characteristics of datasets and compatibility with various algorithms. Finally, this investigation laid the foundation for data preprocessing and feature engineering across diverse models and domains which offers actionable insights for practitioners.
引用
收藏
页码:135300 / 135314
页数:15
相关论文
共 50 条
  • [1] A comparison of time series and machine learning models for inflation forecasting: empirical evidence from the USA
    Volkan Ülke
    Afsin Sahin
    Abdulhamit Subasi
    Neural Computing and Applications, 2018, 30 : 1519 - 1527
  • [2] A comparison of time series and machine learning models for inflation forecasting: empirical evidence from the USA
    Ulke, Volkan
    Sahin, Afsin
    Subasi, Abdulhamit
    NEURAL COMPUTING & APPLICATIONS, 2018, 30 (05): : 1519 - 1527
  • [3] Unboxing machine learning models for concrete strength prediction using XAI
    Elhishi, Sara
    Elashry, Asmaa Mohammed
    El-Metwally, Sara
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [4] Unboxing machine learning models for concrete strength prediction using XAI
    Sara Elhishi
    Asmaa Mohammed Elashry
    Sara El-Metwally
    Scientific Reports, 13
  • [5] Improving empirical models with machine learning
    Bhattacharya, Biswa
    Solomatine, Dimitri
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 4854 - 4861
  • [6] Insight into glacio-hydrologicalprocesses using explainable machine-learning (XAI) models
    Hao, Huiqing
    Hao, Yonghong
    Li, Zhongqin
    Qi, Cuiting
    Wang, Qi
    Zhang, Ming
    Liu, Yan
    Liu, Qi
    Yeh, Tian-Chyi Jim
    JOURNAL OF HYDROLOGY, 2024, 634
  • [7] Towards a Standardization of Computational Models of Affect: OWL and Machine Learning
    Tuccini, Gianmarco
    Baronti, Luca
    Corti, Laura
    Lanfredini, Roberta
    HUMANA MENTE-JOURNAL OF PHILOSOPHICAL STUDIES, 2020, 13 (37): : 66 - 97
  • [8] Impact of financial development and internet use on export growth: New evidence from machine learning models
    Shetewy, Nsreen
    Shahin, Ahmed Ismail
    Omri, Anis
    Dai, Kuizao
    RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE, 2022, 61
  • [9] When to Use Machine Learning: A Course Assignment
    Kayhan, Varol
    COMMUNICATIONS OF THE ASSOCIATION FOR INFORMATION SYSTEMS, 2022, 50 (01): : 122 - 142
  • [10] XAI-based cross-ensemble feature ranking methodology for machine learning models
    Jiang P.
    Suzuki H.
    Obi T.
    International Journal of Information Technology, 2023, 15 (4) : 1759 - 1768