A genetic algorithm for simulating correlated binary data from biomedical research

被引:8
|
作者
Kruppa, Jochen [1 ]
Lepenies, Bernd [2 ,3 ]
Jung, Klaus [1 ,3 ]
机构
[1] Univ Vet Med Hannover, Inst Anim Breeding & Genet, Bunteweg 17p, D-30559 Hannover, Germany
[2] Univ Vet Med Hannover, Immunol Unit, Hannover, Germany
[3] Univ Vet Med Hannover, Res Ctr Emerging Infect & Zoonoses RIZ, Hannover, Germany
关键词
Correlated binary data; Genetic algorithm; High-dimensional data; Random number generation; Computer simulation; DISTRIBUTIONS; ASSOCIATION; VARIABLES; MODELS;
D O I
10.1016/j.compbiomed.2017.10.023
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correlated binary data arise in a large variety of biomedical research. In order to evaluate methods for their analysis, computer simulations of such data are often required. Existing methods can often not cover the full range of possible correlations between the variables or are not available as implemented software. We propose a genetic algorithm that approaches the desired correlation structure under a given marginal distribution. The procedure generates a large representative matrix from which the probabilities of individual observations can be derived or from which samples can be drawn directly. Our genetic algorithm is evaluated under different specified marginal frequencies and correlation structures, and is compared against two existing approaches. The evaluation checks the speed and precision of the approach as well as its suitability for generating also high-dimensional data. In an example of high-throughput glycan array data, we demonstrate the usability of our approach to simulate the power of global test procedures. An implementation of our own and two other methods were added to the R package `RepeatedHighDim'. The presented algorithm is not restricted to certain correlation structures. In contrast to existing methods it is also evaluated for high-dimensional data.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [1] Simulating correlated count data
    L. Madsen
    D. Dalthorp
    Environmental and Ecological Statistics, 2007, 14 : 129 - 148
  • [2] Simulating correlated count data
    Madsen, L.
    Dalthorp, D.
    ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2007, 14 (02) : 129 - 148
  • [3] i2d: an R package for simulating data from images and the implications in biomedical research
    Liang, Xiaoyu
    Hu, Ying
    Yan, Chunhua
    Xu, Ke
    BIOINFORMATICS, 2021, 37 (16) : 2497 - 2498
  • [4] An efficient binary chimp optimization algorithm for feature selection in biomedical data classification
    Elnaz Pashaei
    Elham Pashaei
    Neural Computing and Applications, 2022, 34 : 6427 - 6451
  • [5] An efficient binary chimp optimization algorithm for feature selection in biomedical data classification
    Pashaei, Elnaz
    Pashaei, Elham
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (08): : 6427 - 6451
  • [6] A simple and effective method for simulating nested exchangeable correlated binary data for longitudinal cluster randomised trials
    Bowden, Rhys A.
    Kasza, Jessica
    Forbes, Andrew B.
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [7] Simulating interferometric data of binary systems
    Paladini, Claudia
    Jorissen, Alain
    Siopis, Christos
    Sadowski, Gilles
    Shulyak, Denis
    Li Causi, Gianluca
    OPTICAL AND INFRARED INTERFEROMETRY IV, 2014, 9146
  • [8] An efficient MCEM algorithm for fitting generalized linear mixed models for correlated binary data
    Tan, M.
    Tian, G. -L.
    Fang, H. -B.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2007, 77 (11-12) : 929 - 943
  • [9] Comparisons of spatially correlated binary data
    Sim, SY
    Johnson, RA
    STATISTICS & PROBABILITY LETTERS, 1998, 39 (02) : 81 - 87
  • [10] Correlated Binary Data for Machine Learning
    Llobet Turro, Marti
    Cabrera-Bean, Margarita
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1411 - 1415