Concurrent generation of multivariate mixed data with variables of dissimilar types

被引:2
|
作者
Amatya, Anup [1 ]
Demirtas, Hakan [2 ]
机构
[1] New Mexico State Univ, Dept Publ Hlth Sci, 1335 Int Mall,RM 102, Las Cruces, NM 88011 USA
[2] Univ Illinois, Div Epidemiol & Biostat MC923, Chicago, IL USA
关键词
Generalized Poisson; mutivariate ordinal; discretization; CYSTITIS DATA-BASE; ORDINAL DATA; COUNT DATA; SIMULATION; DISTRIBUTIONS; MATRIX;
D O I
10.1080/00949655.2016.1177530
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data sets originating from wide range of research studies are composed of multiple variables that are correlated and of dissimilar types, primarily of count, binary/ordinal and continuous attributes. The present paper builds on the previous works on multivariate data generation and develops a framework for generating multivariate mixed data with a pre-specified correlation matrix. The generated data consist of components that are marginally count, binary, ordinal and continuous, where the count and continuous variables follow the generalized Poisson and normal distributions, respectively. The use of the generalized Poisson distribution provides a flexible mechanism which allows under- and over-dispersed count variables generally encountered in practice. A step-by-step algorithm is provided and its performance is evaluated using simulated and real-data scenarios.
引用
收藏
页码:3595 / 3607
页数:13
相关论文
共 50 条
  • [21] State space models for multivariate longitudinal data of mixed types (vol 24, pg 385, 1996)
    Jorgensen, B
    LundbyeChristensen, S
    Song, PXK
    Sun, L
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1997, 25 (03): : 425 - 425
  • [22] A parallel algorithm for ridge-penalized estimation of the multivariate exponential family from data of mixed types
    Trip, Diederik S. Laman
    van Wieringen, Wessel N.
    STATISTICS AND COMPUTING, 2021, 31 (04)
  • [23] A parallel algorithm for ridge-penalized estimation of the multivariate exponential family from data of mixed types
    Diederik S. Laman Trip
    Wessel N. van Wieringen
    Statistics and Computing, 2021, 31
  • [24] Multivariate nonparametric resampling scheme for generation of daily weather variables
    Rajagopalan, B
    Lall, U
    Tarboton, DG
    Bowles, DS
    STOCHASTIC HYDROLOGY AND HYDRAULICS, 1997, 11 (01): : 65 - 93
  • [25] Multivariate nonparametric resampling scheme for generation of daily weather variables
    B. Rajagopalan
    U. Lall
    D. G. Tarboton
    D. S. Bowles
    Stochastic Hydrology and Hydraulics, 1997, 11 : 65 - 93
  • [26] A joint modeling and estimation method for multivariate longitudinal data with mixed types of responses to analyze physical activity data generated by accelerometers
    Li, Haocheng
    Zhang, Yukun
    Carroll, Raymond J.
    Keadle, Sarah Kozey
    Sampson, Joshua N.
    Matthews, Charles E.
    STATISTICS IN MEDICINE, 2017, 36 (25) : 4028 - 4040
  • [27] Selection of variables for interpreting multivariate gas sensor data
    Eklöv, T
    Mårtensson, P
    Lundström, I
    ANALYTICA CHIMICA ACTA, 1999, 381 (2-3) : 221 - 232
  • [28] Searching for optimal variables in real multivariate stochastic data
    Raischel, F.
    Russo, A.
    Haase, M.
    Kleinhans, D.
    Lind, P. G.
    PHYSICS LETTERS A, 2012, 376 (30-31) : 2081 - 2089
  • [29] Multivariate probit linear mixed models for multivariate longitudinal binary data
    Lee, Kuo-Jung
    Kim, Chanmin
    Yoo, Jae Keun
    Lee, Keunbaik
    STATISTICS IN MEDICINE, 2024, 43 (08) : 1527 - 1548
  • [30] Graphical model for mixed data types
    Wu, Qiying
    Wang, Huiwen
    Lu, Shan
    Sun, Hui
    NEUROCOMPUTING, 2025, 611