Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market

被引:20
|
作者
Zanin, Luca [1 ]
机构
[1] Prometeia, Piazza Trento & Trieste 3, I-40137 Bologna, Italy
关键词
Class imbalance; Machine learning; Combining multiple probability predictions; Credit risk scoring prediction; Peer-to-peer lending; GENERALIZED ADDITIVE-MODELS; CREDIT; REGRESSION; INFORMATION; SELECTION; AREAS;
D O I
10.1016/j.jbef.2020.100272
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Credit risk scoring predictions represent an effective guide for lenders to discriminate between potential good (who will repay the loan) and bad (who will default) borrowers in the online social lending market. A common characteristic of such a market is a lower percentage of defaulted borrowers than non-defaulted borrowers; thus, the sample is class imbalanced. Class imbalance may affect the accuracy of default predictions, as classifiers tend to be biased towards the majority class (good borrowers). We analyse the default prediction performance when combining class rebalancing methods with different regression and machine learning techniques. We also propose to combine multiple probability predictions to improve the predictive performance. The analysis is based on a book of loans (with a three-year term) funded in the 2010-2015 period though the online platform of Lending Club. The results show that some measures of predictive accuracy tend to improve when the scoring models are trained using a rebalanced, rather than an imbalanced sample, except when the extreme gradient boosting approach is applied. Finally, we find that combining multiple probability predictions via regularised logistic regression may help to improve the predictive accuracy. (c) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
empty
未找到相关数据