When to Use Standardization and Normalization: Empirical Evidence From Machine Learning Models and XAI

被引:4
|
作者
Sujon, Khaled Mahmud [1 ]
Hassan, Rohayanti Binti [2 ]
Towshi, Zeba Tusnia [3 ]
Othman, Manal A. [4 ]
Samad, Md Abdus [5 ]
Choi, Kwonhue [5 ]
机构
[1] Univ Teknol Malaysia UTM, Fac Comp, Dept Software Engn, Johor Baharu 81310, Johor, Malaysia
[2] Univ Teknol Malaysia UTM, Fac Comp, Johor Baharu 81310, Johor, Malaysia
[3] Independent Univ, Dept Comp Sci & Engn, Dhaka 1229, Bangladesh
[4] Princess Nourah Bint Abdulrahman Univ, Coll Med, Med Educ Dept, Biomed Informat, Riyadh 11671, Saudi Arabia
[5] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Standardization; normalization; feature scaling; data preprocessing; machine learning; explainable AI (XAI);
D O I
10.1109/ACCESS.2024.3462434
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Optimizing machine learning (ML) model performance relies heavily on appropriate data preprocessing techniques. Despite the widespread use of standardization and normalization, empirical comparisons across different models, dataset sizes, and domains remain sparse. This study bridges this gap by evaluating five machine learning algorithms- Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Adaptive Boosting (AdaBoost)-on datasets of varying sizes from the business, health, and agriculture domains. This study assessed the models without scaling, with standardized data, and with normalized data. The comparative analysis reveals that while standardization consistently improves the performance of linear models like SVM and LR for large and medium datasets, normalization enhances the performance of linear models for small datasets. Moreover, this study employs SHapley Additive exPlanations (SHAP) summary plots to understand how each feature contributes to the model's performance interpretability with unscaled and scaled datasets. This study provides practical guidelines for selecting appropriate scaling techniques based on the characteristics of datasets and compatibility with various algorithms. Finally, this investigation laid the foundation for data preprocessing and feature engineering across diverse models and domains which offers actionable insights for practitioners.
引用
收藏
页码:135300 / 135314
页数:15
相关论文
共 50 条
  • [21] Firms’ Beliefs and Learning: Models, Identification, and Empirical Evidence
    Victor Aguirregabiria
    Jihye Jeon
    Review of Industrial Organization, 2020, 56 : 203 - 235
  • [22] When not to use machine learning: A perspective on potential and limitations
    Carbone, Matthew R.
    MRS BULLETIN, 2022, 47 (09) : 968 - 974
  • [23] When not to use machine learning: A perspective on potential and limitations
    Matthew R. Carbone
    MRS Bulletin, 2022, 47 : 968 - 974
  • [24] When will China’s industrial carbon emissions peak? Evidence from machine learning
    Qiying Ran
    Fanbo Bu
    Asif Razzaq
    Wenfeng Ge
    Jie Peng
    Xiaodong Yang
    Yang Xu
    Environmental Science and Pollution Research, 2023, 30 : 57960 - 57974
  • [25] When will China's industrial carbon emissions peak? Evidence from machine learning
    Ran, Qiying
    Bu, Fanbo
    Razzaq, Asif
    Ge, Wenfeng
    Peng, Jie
    Yang, Xiaodong
    Xu, Yang
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (20) : 57960 - 57974
  • [26] An Empirical Study on Machine Learning Models for Wind Power Predictions
    Liu, Yiqian
    Zhang, Huajie
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 758 - 763
  • [27] An Empirical Comparison of Machine Learning Models for Time Series Forecasting
    Ahmed, Nesreen K.
    Atiya, Amir F.
    El Gayar, Neamat
    El-Shishiny, Hisham
    ECONOMETRIC REVIEWS, 2010, 29 (5-6) : 594 - 621
  • [28] Sharing Student Models That Use Machine Learning
    Valdes, Benjamin
    Ramirez, Carlos
    Ramirez, Jorge
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2015, 2015, 9112 : 877 - 880
  • [29] Uncertainty and fluctuation in crude oil price: evidence from machine learning models
    Ma, Feng
    Lu, Xinjie
    Zhu, Bo
    ANNALS OF OPERATIONS RESEARCH, 2025, 345 (2-3) : 725 - 755
  • [30] Transferable empirical pseudopotenials from machine learning
    Kim, Rokyeon
    Son, Young -Woo
    PHYSICAL REVIEW B, 2024, 109 (04)