Decision tree approaches for zero-inflated count data

被引:17
|
作者
Lee, Seong-Keon
Jin, Seohoon
机构
[1] Sungshin Womens Univ, Dept Stat, Seoul 136742, South Korea
[2] Hyundai Capital, Seoul, South Korea
关键词
data mining; decision tree; homogeneity; maximum likelihood; zero inflated Poisson (ZIP);
D O I
10.1080/02664760600743613
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
There have been many methodologies developed about zero-inflated data in the field of statistics. However, there is little literature in the data mining fields, even though zero-inflated data could be easily found in real application fields. In fact, there is no decision tree method that is suitable for zero-inflated responses. To analyze continuous target variable with decision trees as one of data mining techniques, we use F-statistics (CHAID) or variance reduction ( CART) criteria to find the best split. But these methods are only appropriate to a continuous target variable. If the target variable is rare events or zero-inflated count data, the above criteria could not give a good result because of its attributes. In this paper, we will propose a decision tree for zero-inflated count data, using a maximum of zero-inflated Poisson likelihood as the split criterion. In addition, using well-known data sets we will compare the performance of the split criteria. In the case when the analyst is interested in lower value groups ( e. g. no defect areas, customers who do not claim), the suggested ZIP tree would be more efficient.
引用
收藏
页码:853 / 865
页数:13
相关论文
共 50 条
  • [1] Mediation analysis for count and zero-inflated count data
    Cheng, Jing
    Cheng, Nancy F.
    Guo, Zijian
    Gregorich, Steven
    Ismail, Amid I.
    Gansky, Stuart A.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (09) : 2756 - 2774
  • [2] The analysis of zero-inflated count data: Beyond zero-inflated Poisson regression.
    Loeys, Tom
    Moerkerke, Beatrijs
    De Smet, Olivia
    Buysse, Ann
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2012, 65 (01): : 163 - 180
  • [3] Semiparametric analysis of zero-inflated count data
    Lam, K. F.
    Xue, Hongqi
    Cheung, Yin Bun
    [J]. BIOMETRICS, 2006, 62 (04) : 996 - 1003
  • [4] Modelling correlated zero-inflated count data
    Dobbie, MJ
    Welsh, AH
    [J]. AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2001, 43 (04) : 431 - 444
  • [5] Multiple imputation of incomplete zero-inflated count data
    Kleinke, Kristian
    Reinecke, Jost
    [J]. STATISTICA NEERLANDICA, 2013, 67 (03) : 311 - 336
  • [6] Modeling count data with marginalized zero-inflated distributions
    Cummings, Tammy H.
    Hardin, James W.
    [J]. STATA JOURNAL, 2019, 19 (03): : 499 - 509
  • [7] A dynamic hurdle model for zero-inflated count data
    Baetschmann, Gregori
    Winkelmann, Rainer
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (14) : 7174 - 7187
  • [8] Zero-inflated models with application to spatial count data
    Deepak K. Agarwal
    Alan E. Gelfand
    Steven Citron-Pousty
    [J]. Environmental and Ecological Statistics, 2002, 9 : 341 - 355
  • [9] Marginal zero-inflated regression models for count data
    Martin, Jacob
    Hall, Daniel B.
    [J]. JOURNAL OF APPLIED STATISTICS, 2017, 44 (10) : 1807 - 1826
  • [10] Semiparametric analysis of longitudinal zero-inflated count data
    Feng, Jiarui
    Zhu, Zhongyi
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2011, 102 (01) : 61 - 72