A genetic algorithm for simulating correlated binary data from biomedical research

被引:8
|
作者
Kruppa, Jochen [1 ]
Lepenies, Bernd [2 ,3 ]
Jung, Klaus [1 ,3 ]
机构
[1] Univ Vet Med Hannover, Inst Anim Breeding & Genet, Bunteweg 17p, D-30559 Hannover, Germany
[2] Univ Vet Med Hannover, Immunol Unit, Hannover, Germany
[3] Univ Vet Med Hannover, Res Ctr Emerging Infect & Zoonoses RIZ, Hannover, Germany
关键词
Correlated binary data; Genetic algorithm; High-dimensional data; Random number generation; Computer simulation; DISTRIBUTIONS; ASSOCIATION; VARIABLES; MODELS;
D O I
10.1016/j.compbiomed.2017.10.023
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correlated binary data arise in a large variety of biomedical research. In order to evaluate methods for their analysis, computer simulations of such data are often required. Existing methods can often not cover the full range of possible correlations between the variables or are not available as implemented software. We propose a genetic algorithm that approaches the desired correlation structure under a given marginal distribution. The procedure generates a large representative matrix from which the probabilities of individual observations can be derived or from which samples can be drawn directly. Our genetic algorithm is evaluated under different specified marginal frequencies and correlation structures, and is compared against two existing approaches. The evaluation checks the speed and precision of the approach as well as its suitability for generating also high-dimensional data. In an example of high-throughput glycan array data, we demonstrate the usability of our approach to simulate the power of global test procedures. An implementation of our own and two other methods were added to the R package `RepeatedHighDim'. The presented algorithm is not restricted to certain correlation structures. In contrast to existing methods it is also evaluated for high-dimensional data.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [41] Genetic Improvement of Data gives Binary Logarithm from sqrt
    Langdon, W. B.
    Petke, Justyna
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 413 - 414
  • [42] Research on the skew effects of parallel data in binary-hash join algorithm
    Zhang, Pengyu
    Sui, Haiyan
    Li, Qinghua
    Huazhong Ligong Daxue Xuebao/Journal Huazhong (Central China) University of Science and Technology, 27 (06): : 34 - 36
  • [43] Simulating Data for Clinical Research: A Tutorial
    Beaujean, A. Alexander
    JOURNAL OF PSYCHOEDUCATIONAL ASSESSMENT, 2018, 36 (01) : 7 - 20
  • [44] Data reduction algorithm for correlated data in the smart grid
    Pourmirza, Zoya
    Walker, Sara
    Brooke, John
    IET SMART GRID, 2021, 4 (05) : 474 - 488
  • [45] A Simple Distribution for the Sum of Correlated, Exchangeable Binary Data
    Witt, Gary
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (20) : 4265 - 4280
  • [46] Modelling Correlated Bivariate Binary Data: A Comparative View
    Gulshan, Jahida
    Khan, Azmeri
    Islam, M. Ataharul
    BULLETIN OF THE MALAYSIAN MATHEMATICAL SCIENCES SOCIETY, 2022, 45 (SUPPL 1) : 251 - 270
  • [47] RAP via hybrid genetic simulating annealing algorithm
    Deepika Garg
    Sarita Devi
    International Journal of System Assurance Engineering and Management, 2021, 12 : 419 - 425
  • [48] Sample size and power calculations with correlated binary data
    Pan, W
    CONTROLLED CLINICAL TRIALS, 2001, 22 (03): : 211 - 227
  • [49] Modelling Correlated Bivariate Binary Data: A Comparative View
    Jahida Gulshan
    Azmeri Khan
    M Ataharul Islam
    Bulletin of the Malaysian Mathematical Sciences Society, 2022, 45 : 251 - 270
  • [50] Effect of omitted confounders on the analysis of correlated binary data
    Chao, WH
    Palta, M
    Young, T
    BIOMETRICS, 1997, 53 (02) : 678 - 689