Comparative Analysis of Predictive Analytics Models in Classification Problems

被引:1
|
作者
Polyakov, Konstantin [1 ]
Liudmila, Zhukova [1 ,2 ]
机构
[1] Natl Res Univ Higher Sch Econ, Fac Econ Sci, Moscow, Russia
[2] EC Leasing, Moscow, Russia
关键词
predictive Analytics; descriptive Analytics; social media analysis; microfinance organization;
D O I
10.1109/APSSE47353.2019.00028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Present research is devoted to the comparative analysis of the quality of classification for some methods of descriptive and predictive analytics in the case when most (or all) of independent variables are measured in quality scale with large amount of levels. In this case, some classification methods or their popular realizations calls for conversion of quality variables into systems of dummy variables. If quality scales have large amount of levels which are presented in almost equal proportions in the training set, i.e. it doesn't make sense to enlarge levels, above mentioned requirement will lead to the dramatically rise of problem dimension. As a result, researcher is faced with the curse of dimensionality. It means that, if the problem dimension rise, it'll be necessary to rise the sample size to preserve factors impact estimation accuracy. At the same time, it's not always possible to arrange appropriate growth of the training set volume. In some cases, it's limited by specific properties of the body of interest (system). If such situation appears, it'll be extremely important to evaluate the sensitivity of prediction/classification methods to the curse of dimensionality. Authors of this research focused on the four method of classification, which earn first lines in the lists of the popular methods of business analysis long ago. There are: Two methods of classification tree building - CART and C4.5 Logistic regression Classification on the basis of random forest The first three are descriptive methods, which let's get interpreting (man ready) models, the fourth belongs to predictive analytics. Selection is not random. Descriptive analytics problems extremely important for the process of planning, when it's necessary to get answer on the question "What will be if ... ?". Particularly, one need to get target group description for organization of marketing communication. At the same time, it is quite conceivable that utilization of interpreting (man ready) models involves loss of prediction quality in comparison with methods of predictive analytics. The current research domain is the activity of microfinancing institutions (MFIs). Traditional problem here is the potential client assessment. The main challenge, which arise in the process of above mentioned problem solution, is the constraints on the volume, composition and type of data, which is available for prediction of default or default probability assessment. Thus, it's necessary to evaluate the abilities of classification methods which were designed for work with large amount of data (it means big size of the training set and a lot of variables, from which the most important should be selected). In real practice of microfinancing organization, the most of recorded factors are measured on the qualitative scales with large amount of levels, what is the origin of the above- mentioned problems. The empirical part of the research is grounded on the data of real microfinancing organization. Some hypotheses about the reasons of default were tested as byproduct of this research.
引用
收藏
页码:162 / 169
页数:8
相关论文
共 50 条
  • [21] Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R
    Wehling, David
    Klasen, Kate
    [J]. INTERFACES, 2015, 45 (03) : 279 - 280
  • [22] Solving Some Problems of Predictive Analytics for Time Series Data
    Botygin, Igor
    Sherstneva, Anna
    Sherstnev, Vladislav
    [J]. SOFTWARE ENGINEERING PERSPECTIVES IN SYSTEMS, VOL. 1, 2022, 501 : 382 - 391
  • [23] Predictive big data analytics for drilling downhole problems: A review
    Abdullah, M. Aslam
    Aseel, A.
    Roy, Rithul
    Sunil, Pranav
    [J]. ENERGY REPORTS, 2023, 9 : 5863 - 5876
  • [24] Comparative Study of Machine Learning Algorithms towards Predictive Analytics
    Petchiappan M.
    Aravindhen J.
    [J]. Recent Advances in Computer Science and Communications, 2023, 16 (06) : 69 - 79
  • [25] A CLASSIFICATION OF MODELS IN PREDICTIVE MICROBIOLOGY - REPLY
    WHITING, RC
    BUCHANAN, RL
    [J]. FOOD MICROBIOLOGY, 1993, 10 (02) : 175 - 177
  • [26] Assessing the quality of predictive models for classification
    Chin, JS
    Li, YC
    Wang, YF
    [J]. AMERICAN JOURNAL OF CARDIOLOGY, 2005, 96 (02): : 323 - 324
  • [27] Comparative Analysis of Deep Learning Models for Myanmar Text Classification
    Phyu, Myat Sapal
    Nwet, Khin Thandar
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 76 - 85
  • [28] A Comparative Analysis of Latent Variable Models for Web Page Classification
    Biro, Istvan
    Benczur, Andras
    Szabo, Jacint
    Maguitman, Ana
    [J]. 2008 LATIN AMERICAN WEB CONFERENCE (LA-WEB), 2008, : 23 - +
  • [29] Comparative Analysis of NLP-Based Models for Company Classification
    Rizinski, Maryan
    Jankov, Andrej
    Sankaradas, Vignesh
    Pinsky, Eugene
    Mishkovski, Igor
    Trajanov, Dimitar
    [J]. INFORMATION, 2024, 15 (02)
  • [30] Comparative Analysis of Restricted Boltzmann Machine Models for Image Classification
    Dewi, Christine
    Chen, Rung-Ching
    Hendry
    Hung, Hsiu-Te
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 285 - 296