A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring

被引:449
|
作者
Xia, Yufei [1 ]
Liu, Chuanzhe [1 ]
Li, YuYing [2 ]
Liu, Nana [1 ]
机构
[1] China Univ Min & Technol, Sch Management, Xuzhou 221116, Jiangsu, Peoples R China
[2] China Univ Min & Technol, Sch Foreign Studies, Xuzhou 221116, Jiangsu, Peoples R China
关键词
Credit scoring; Boosted decision tree; Bayesian hyper-parameter optimization; ART CLASSIFICATION ALGORITHMS; BANKRUPTCY PREDICTION; RISK-ASSESSMENT; ENSEMBLE; CLASSIFIERS; REGRESSION; MACHINE; MODELS; FOREST;
D O I
10.1016/j.eswa.2017.02.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have been recently developed in the credit scoring domain. These methods have proven their superiority in discriminating borrowers accurately. However, among the ensemble models, little consideration has been provided to the following: (1) highlighting the hyper-parameter tuning of base learner despite being critical to well-performed ensemble models; (2) building sequential models (i.e., boosting, as most have focused on developing the same or different algorithms in parallel); and (3) focusing on the comprehensibility of models. This paper aims to propose a sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost)). The model mainly comprises three steps. First, data pre-processing is employed to scale the data and handle missing values. Second, a model-based feature selection system based on the relative feature importance scores is utilized to remove redundant variables. Third, the hyper-parameters of XGBoost are adaptively tuned with Bayesian hyper-parameter optimization and used to train the model with selected feature subset. Several hyper-parameter optimization methods and baseline classifiers are considered as reference points in the experiment. Results demonstrate that Bayesian hyper-parameter optimization performs better than random search, grid search, and manual search. Moreover, the proposed model outperforms baseline models on average over four evaluation measures: accuracy, error rate, the area under the curve (AUC) H measure (AUC-H measure), and Brier score. The proposed model also provides feature importance scores and decision chart, which enhance the interpretability of credit scoring model. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:225 / 241
页数:17
相关论文
共 50 条
  • [1] Bayesian Optimization for Accelerating Hyper-parameter Tuning
    Vu Nguyen
    [J]. 2019 IEEE SECOND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE), 2019, : 302 - 305
  • [2] Hyper-parameter Tuning of a Decision Tree Induction Algorithm
    Mantovani, Rafael G.
    Horvath, Tomas
    Cerri, Ricardo
    Vanschoren, Joaquin
    de Carvalho, Andre C. P. L. F.
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 37 - 42
  • [3] A Modified Bayesian Optimization based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting
    Putatunda, Sayan
    Rama, Kiran
    [J]. 2019 FIFTEENTH INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (ICINPRO): INTERNET OF THINGS, 2019, : 6 - 11
  • [4] Hyper-parameter Optimization Using Continuation Algorithms
    Rojas-Delgado, Jairo
    Jimenez, J. A.
    Bello, Rafael
    Lozano, J. A.
    [J]. METAHEURISTICS, MIC 2022, 2023, 13838 : 365 - 377
  • [5] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
    Koul, Nimrita
    Manvi, Sunilkumar S.
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (11-12) : 2353 - 2371
  • [6] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
    Nimrita Koul
    Sunilkumar S. Manvi
    [J]. Medical & Biological Engineering & Computing, 2021, 59 : 2353 - 2371
  • [7] Credit scoring based on a Bagging-cascading boosted decision tree
    Zou, Yao
    Gao, Changchun
    Xia, Meng
    Pang, Congyuan
    [J]. INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1557 - 1578
  • [8] A Hyper-Parameter Optimization Approach to Automated Radiotherapy Treatment Planning
    Haaf, S.
    Kearney, V.
    Interian, Y.
    Valdes, G.
    Solberg, T.
    Perez-Andujar, A.
    [J]. MEDICAL PHYSICS, 2017, 44 (06) : 2901 - 2901
  • [9] Random Search for Hyper-Parameter Optimization
    Bergstra, James
    Bengio, Yoshua
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 281 - 305
  • [10] Hyper-parameter Optimization for Latent Spaces
    Veloso, Bruno
    Caroprese, Luciano
    Konig, Matthias
    Teixeira, Sonia
    Manco, Giuseppe
    Hoos, Holger H.
    Gama, Joao
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 249 - 264