Comparative Analysis of Predictive Analytics Models in Classification Problems

被引:1
|
作者
Polyakov, Konstantin [1 ]
Liudmila, Zhukova [1 ,2 ]
机构
[1] Natl Res Univ Higher Sch Econ, Fac Econ Sci, Moscow, Russia
[2] EC Leasing, Moscow, Russia
关键词
predictive Analytics; descriptive Analytics; social media analysis; microfinance organization;
D O I
10.1109/APSSE47353.2019.00028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Present research is devoted to the comparative analysis of the quality of classification for some methods of descriptive and predictive analytics in the case when most (or all) of independent variables are measured in quality scale with large amount of levels. In this case, some classification methods or their popular realizations calls for conversion of quality variables into systems of dummy variables. If quality scales have large amount of levels which are presented in almost equal proportions in the training set, i.e. it doesn't make sense to enlarge levels, above mentioned requirement will lead to the dramatically rise of problem dimension. As a result, researcher is faced with the curse of dimensionality. It means that, if the problem dimension rise, it'll be necessary to rise the sample size to preserve factors impact estimation accuracy. At the same time, it's not always possible to arrange appropriate growth of the training set volume. In some cases, it's limited by specific properties of the body of interest (system). If such situation appears, it'll be extremely important to evaluate the sensitivity of prediction/classification methods to the curse of dimensionality. Authors of this research focused on the four method of classification, which earn first lines in the lists of the popular methods of business analysis long ago. There are: Two methods of classification tree building - CART and C4.5 Logistic regression Classification on the basis of random forest The first three are descriptive methods, which let's get interpreting (man ready) models, the fourth belongs to predictive analytics. Selection is not random. Descriptive analytics problems extremely important for the process of planning, when it's necessary to get answer on the question "What will be if ... ?". Particularly, one need to get target group description for organization of marketing communication. At the same time, it is quite conceivable that utilization of interpreting (man ready) models involves loss of prediction quality in comparison with methods of predictive analytics. The current research domain is the activity of microfinancing institutions (MFIs). Traditional problem here is the potential client assessment. The main challenge, which arise in the process of above mentioned problem solution, is the constraints on the volume, composition and type of data, which is available for prediction of default or default probability assessment. Thus, it's necessary to evaluate the abilities of classification methods which were designed for work with large amount of data (it means big size of the training set and a lot of variables, from which the most important should be selected). In real practice of microfinancing organization, the most of recorded factors are measured on the qualitative scales with large amount of levels, what is the origin of the above- mentioned problems. The empirical part of the research is grounded on the data of real microfinancing organization. Some hypotheses about the reasons of default were tested as byproduct of this research.
引用
收藏
页码:162 / 169
页数:8
相关论文
共 50 条
  • [1] Comparative Analysis of Predictive Interstitial Glucose Level Classification Models
    Kistkins, Svjatoslavs
    Mihailovs, Timurs
    Lobanovs, Sergejs
    Pirags, Valdis
    Sourij, Harald
    Moser, Othmar
    Bliznuks, Dmitrijs
    [J]. SENSORS, 2023, 23 (19)
  • [2] Predictive analytics: Parametric models for regression and classification using R
    Huang, Yu-Jyun
    [J]. BIOMETRICS, 2022, 78 (02) : 816 - 817
  • [3] A Survey on Predictive Models of Learning Analytics
    Ranjeeth, S.
    Latchoumi, T. P.
    Paul, P. Victer
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 37 - 46
  • [4] Advanced Analytics Improve Predictive Models
    Reckamp, Joseph
    [J]. InTech, 2021, 68 (06)
  • [5] How to anticipate maintenance problems with predictive analytics
    Reckamp, Joe
    [J]. Plant Engineering, 2024, 78 (04) : 26 - 29
  • [6] A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models
    Razi, MA
    Athappilly, K
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2005, 29 (01) : 65 - 74
  • [7] Comparative analysis and classification of features for image models
    Gurevich I.B.
    Koryabkina I.V.
    [J]. Pattern Recognition and Image Analysis, 2006, 16 (3) : 265 - 297
  • [8] Freight Generation Models Comparative Analysis of Regression Models and Multiple Classification Analysis
    Bastida, Carlos
    Holguin-Veras, Jose
    [J]. TRANSPORTATION RESEARCH RECORD, 2009, (2097) : 51 - 61
  • [9] Calibration Techniques for Binary Classification Problems: A Comparative Analysis
    Martino, Alessio
    De Santis, Enrico
    Baldini, Luca
    Rizzi, Antonello
    [J]. IJCCI: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2019, : 487 - 495
  • [10] Comparative Analysis of MultiCriteria Inventory Classification Models for ABC Analysis
    Kaabi, Hadhami
    [J]. International Journal of Information Technology and Decision Making, 2022, 21 (05): : 1617 - 1646