Consumer credit risk: Individual probability estimates using machine learning

被引:115
|
作者
Kruppa, Jochen [1 ]
Schwarz, Alexandra [2 ]
Arminger, Gerhard [2 ]
Ziegler, Andreas [1 ]
机构
[1] Univ Lubeck, Univ Klinikum Schleswig Holstein, Inst Med Biometrie & Stat, D-23562 Lubeck, Germany
[2] Univ Wuppertal, Schumpeter Sch Business & Econ, D-42097 Wuppertal, Germany
关键词
Probability estimation; Random forest; Credit scoring; Probability machines; Logistic regression; Machine learning; IMPROVED CONFIDENCE-INTERVALS; CLASSIFICATION ALGORITHMS; RANDOM FORESTS; CONVERGENCE; PERFORMANCE; CONSISTENCY; PREDICTION; REGRESSION;
D O I
10.1016/j.eswa.2013.03.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Consumer credit scoring is often considered a classification task where clients receive either a good or a bad credit status. Default probabilities provide more detailed information about the creditworthiness of consumers, and they are usually estimated by logistic regression. Here, we present a general framework for estimating individual consumer credit risks by use of machine learning methods. Since a probability is an expected value, all nonparametric regression approaches which are consistent for the mean are consistent for the probability estimation problem. Among others, random forests (RF), k-nearest neighbors (kNN), and bagged k-nearest neighbors (bNN) belong to this class of consistent nonparametric regression approaches. We apply the machine learning methods and an optimized logistic regression to a large dataset of complete payment histories of short-termed installment credits. We demonstrate probability estimation in Random Jungle, an RF package written in C++ with a generalized framework for fast tree growing, probability estimation, and classification. We also describe an algorithm for tuning the terminal node size for probability estimation. We demonstrate that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits. (c) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5125 / 5131
页数:7
相关论文
共 50 条
  • [31] Credit scoring using ensemble machine learning
    Yao, Ping
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 3, PROCEEDINGS, 2009, : 244 - 246
  • [32] ESTIMATES FOR THE ABSOLUTE RUIN PROBABILITY IN THE COMPOUND POISSON RISK MODEL WITH CREDIT AND DEBIT INTEREST
    Zhu, Jinxia
    Yang, Hailiang
    JOURNAL OF APPLIED PROBABILITY, 2008, 45 (03) : 818 - 830
  • [33] CONSUMER BANKING AND CREDIT RISK
    Rodrigo Alfaro, A.
    Daniel Calvo, C.
    Daniel Oda, Z.
    ECONOMIA CHILENA, 2009, 12 (03): : 59 - +
  • [34] Consumer credit risk and pricing
    Getter, DE
    JOURNAL OF CONSUMER AFFAIRS, 2006, 40 (01) : 41 - 63
  • [35] Sovereign credit risk modeling using machine learning: a novel approach to sovereign credit risk incorporating private sector and sustainability risks
    Anand, Arsh
    Baesens, Bart
    Vanpee, Rosanne
    JOURNAL OF CREDIT RISK, 2023, 19 (01): : 105 - 154
  • [36] Consumer product prediction using machine learning
    Ajitha, P.
    Tamilvizhi, T.
    Sowjanya, K. Naga
    Surendran, R.
    Bala, Bhoomeshwar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2023, 44 (03): : 565 - 574
  • [37] Probability of default estimation in credit risk using a nonparametric approach
    Rebeca Peláez Suárez
    Ricardo Cao Abad
    Juan M. Vilar Fernández
    TEST, 2021, 30 : 383 - 405
  • [38] Probability of default estimation in credit risk using a nonparametric approach
    Pelaez Suarez, Rebeca
    Cao Abad, Ricardo
    Vilar Fernandez, Juan M.
    TEST, 2021, 30 (02) : 383 - 405
  • [39] Intelligent Assessment of Personal Credit Risk Based on Machine Learning
    Wang, Chuansheng
    Yu, Hang
    SYSTEMS, 2025, 13 (02):
  • [40] Early Warning of Companies' Credit Risk Based on Machine Learning
    Tan, Benyan
    Lin, Yujie
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2023, 16 (03) : 1 - 21