A comparative analysis of machine learning techniques for imbalanced data

被引:1
|
作者
Mrad, Ali Ben [1 ,2 ]
Lahiani, Amine [3 ,4 ,5 ]
Mefteh-Wali, Salma [6 ]
Mselmi, Nada [7 ]
机构
[1] Qassim Univ, Coll Comp, Dept Comp Sci, Buraydah, Saudi Arabia
[2] Univ Sfax, CES Lab, ENIS, Sfax, Tunisia
[3] LEO Lab Econ Orleans, Orleans, France
[4] Gulf Univ Sci & Technol, Ctr Sustainable Dev, Kuwait, Kuwait
[5] South Ural State Univ, Chelyabinsk, Russia
[6] ESSCA Sch Management, Angers, France
[7] Paris Saclay Univ, RITM, Paris, France
关键词
Bank inactivity; Classification; Machine learning; MULTIVARIATE STATISTICAL-ANALYSIS; BANK FAILURE; BANKRUPTCY PREDICTION; INVESTOR SENTIMENT; FINANCIAL RATIOS; NEURAL-NETWORK; DISTRESS; PERFORMANCE; INSOLVENCY; SECTOR;
D O I
10.1007/s10479-024-06018-0
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This study compares the predictive accuracy of a set of machine learning models coupled with three resampling techniques (Random Undersampling, Random Oversampling, and Synthetic Minority Oversampling Technique) in predicting bank inactivity. Our sample includes listed banks in EU-28 member states between 2011 and 2019. We employed 23 financial ratios comprising capital adequacy, asset quality, management capability, earnings, liquidity, and sensitivity indicators. The empirical findings established that XGBoost performs exceptionally well as a classifier in predicting bank inactivity, particularly when considering a one-year time frame before the event. Furthermore, our findings indicate that random forest with Synthetic Minority Oversampling Technique demonstrates the highest predictive accuracy two years prior to inactivity, while XGBoost with Random Oversampling outperforms other methods three years in advance. Furthermore, the empirical results emphasize the significance of management capability and loan quality ratios as key factors in predicting bank inactivity. Our findings present important policy implications. Bank inactivity predictive accuracy of machine learning techniques with resampling techniques is analyzed.Data on banks in the EU-28 member states between 2011 and 2019 are used.XGBoost performs exceptionally well one year before inactivity.Random Forest with Synthetic Minority Oversampling is the best classifier two years before inactivity.XGBoost with Random Oversampling outperforms other methods three years before inactivity.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] A comparative analysis of data sets using Machine Learning techniques
    Abhilash, C.B.
    Rohitaksha, K.
    Biradar, Shankar
    [J]. Souvenir of the 2014 IEEE International Advance Computing Conference, IACC 2014, 2014, : 24 - 29
  • [2] A Comparative Analysis of Data sets using Machine Learning Techniques
    Abhilash, C. B.
    Rohitaksha, K.
    Biradar, Shankar
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 24 - 29
  • [3] Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms
    Shah, Viraj Nishchal
    Shah, Deep Rahul
    Shetty, Mayank Umesh
    Krishnan, Deepa
    Ravi, Vinayakumar
    Singh, Swapnil
    [J]. EAI Endorsed Transactions on Scalable Information Systems, 2024, 11 (06) : 1 - 12
  • [4] A comparative analysis of machine learning techniques for imbalanced data (May, 10.1007/s10479-024-06018-0, 2024)
    Mrad, Ali Ben
    Lahiani, Amine
    Mefteh-Wali, Salma
    Mselmi, Nada
    [J]. ANNALS OF OPERATIONS RESEARCH, 2024,
  • [5] Comparative Performance of Deep Learning and Machine Learning Algorithms on Imbalanced Handwritten Data
    Amri, A'Inur A'Fifah
    Ismail, Amelia Ritahani
    Zarir, Abdullah Ahmad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (02) : 258 - 264
  • [6] Imbalanced data preprocessing techniques for machine learning: a systematic mapping study
    de Vargas, Vitor Werner
    Schneider Aranda, Jorge Arthur
    Costa, Ricardo dos Santos
    da Silva Pereira, Paulo Ricardo
    Victoria Barbosa, Jorge Luis
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (01) : 31 - 57
  • [7] Imbalanced data preprocessing techniques for machine learning: a systematic mapping study
    Vitor Werner de Vargas
    Jorge Arthur Schneider Aranda
    Ricardo dos Santos Costa
    Paulo Ricardo da Silva Pereira
    Jorge Luis Victória Barbosa
    [J]. Knowledge and Information Systems, 2023, 65 : 31 - 57
  • [8] Learning With Imbalanced Data in Smart Manufacturing: A Comparative Analysis
    Fathy, Yasmin
    Jaber, Mona
    Brintrup, Alexandra
    [J]. IEEE ACCESS, 2021, 9 : 2734 - 2757
  • [9] Machine learning for mining imbalanced data
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md
    [J]. IAENG International Journal of Computer Science, 2019, 46 (02) : 332 - 348
  • [10] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
    Wah, Yap Bee
    Ismail, Azlan
    Azid, Nur Niswah Naslina
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Zain, Jasni Mohamad
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841