Regression-based imputation of explanatory discrete missing data

被引:1
|
作者
Hernandez-Herrera, Gilma [1 ,2 ]
Navarro, Albert [1 ]
Morina, David [3 ]
机构
[1] Univ Autonoma Barcelona, Res Grp Psychosocial Risks, Unitat Bioestadist, Fac Med,Org Work & Hlth POWAH, Barcelona, Spain
[2] Univ Antioquia, Fac Med, Inst Invest Med, Medellin, Colombia
[3] Univ Barcelona, Dept Econometr Stat & Appl Econ, Riskctr IREA, Barcelona, Spain
关键词
COMPoisson; Count data; Hermite; Missing data; Multiple imputation; Zero-inflated; ZERO-INFLATED POISSON; GENERALIZED HERMITE; SEMIPARAMETRIC ESTIMATION; MULTIPLE IMPUTATION; MODEL;
D O I
10.1080/03610918.2022.2149805
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Imputation of missing values is a strategy for handling non-responses in surveys or data loss in measurement processes, which may be more effective than ignoring the losses and omitting them. The characteristics of variables presenting missing values must be considered when choosing the imputation method to be used; in particular when the variable is a count the literature dealing with this issue is scarce. If the variable has an excess of zeros it is necessary to consider models including parameters for handling zero-inflation. Likewise, if problems of over- or under-dispersion are observed, generalizations of the Poisson, such as the Hermite or Conway-Maxwell Poisson distributions are recommended for carrying out imputation. The aim of this study was to assess the performance of various regression models in the imputation of a discrete variable based on Poisson generalizations, in comparison with classical counting models, through a comprehensive simulation study considering a variety of scenarios and a real data example. To do so we compared the results of estimations using only complete data, and using imputations based on the most common count models. The COMPoisson distribution provides in general better results in any dispersion scenario, especially when the amount of missing information is large.
引用
收藏
页码:4363 / 4379
页数:17
相关论文
共 50 条
  • [41] Support vector regression-based imputation in analogy-based software development effort estimation
    Idri, Ali
    Abnane, Ibtissam
    Abran, Alain
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2018, 30 (12)
  • [42] Missing Data: data replacement and imputation
    Hutcheson, Graeme
    Pampaka, Maria
    [J]. JOURNAL OF MODELLING IN MANAGEMENT, 2012, 7 (02)
  • [43] Missing data imputation based on stochastic neighbor embedding
    Petrov, I. B.
    Ryazanov, V. V.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 698 - 701
  • [44] Missing Categorical Data Imputation Approach Based on Similarity
    Wu, Sen
    Feng, Xiaodong
    Han, Yushan
    Wang, Qiang
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2827 - 2832
  • [45] Imputation of missing data based on locally weighted algorithm
    College of Information Engineering, Shenyang University of Chemical Technology, Shenyang, China
    [J]. J. Comput. Inf. Syst., 4 (1195-1204):
  • [46] Missing Data and Multiple Imputation
    Cummings, Peter
    [J]. JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [47] Missing Data Imputation: A Survey
    Kelkar, Bhagyashri Abhay
    [J]. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)
  • [48] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    [J]. ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [49] MISSING DATA, IMPUTATION, AND THE BOOTSTRAP
    EFRON, B
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (426) : 463 - 475
  • [50] Comparing the Performance of Different Missing Data Imputation Approaches in Discrete Outcome Modeling
    Jahan, Md Istiak
    Bhowmik, Tanmoy
    Hoover, Lauren
    Eluru, Naveen
    [J]. TRANSPORTATION RESEARCH RECORD, 2024,