Handling missing values in population data:: consequences for maximum likelihood estimation of haplotype frequencies

被引:24
|
作者
Gourraud, PA
Génin, E
Cambon-Thomsen, A
机构
[1] Fac Med Toulouse, INSERM, U558, F-31073 Toulouse, France
[2] Hop Paul Brousse, INSERM, U535, F-94817 Villejuif, France
关键词
EM algorithm; HLA; haplotype; missing values; bioinformatics; linkage disequilibrium;
D O I
10.1038/sj.ejhg.5201233
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Haplotype frequency estimation in population data is an important problem in genetics and different methods including expectation maximisation (EM) methods have been proposed. The statistical properties of EM methods have been extensively assessed for data sets with no missing values. When numerous markers and/or individuals are tested, however, it is likely that some genotypes will be missing. Thus, it is of interest to investigate the behaviour of the method in the presence of incomplete genotype observations. We propose an extension of the EM method to handle missing genotypes, and we compare it with commonly used methods (such as ignoring individuals with incomplete genotype information or treating a missing allele as any other allele). Simulations were performed, starting from data sets of haematopoietic stem cell donors genotyped at three HLA loci. We deleted some data to create incomplete genotype observations in various proportions. We then compared the haplotype frequencies obtained on these incomplete data sets using the different methods to those obtained on the complete data. We found that the method proposed here provides better estimations, both qualitatively and quantitatively, but increases the computation time required. We discuss the influence of missing values on the algorithm's efficiency and the advantages and disadvantages of deleting incomplete genotypes. We propose guidelines for missing data handling in routine analysis.
引用
收藏
页码:805 / 812
页数:8
相关论文
共 50 条
  • [1] Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies
    Pierre-Antoine Gourraud
    Emmanuelle Génin
    Anne Cambon-Thomsen
    [J]. European Journal of Human Genetics, 2004, 12 : 805 - 812
  • [2] MAXIMUM-LIKELIHOOD-ESTIMATION OF MOLECULAR HAPLOTYPE FREQUENCIES IN A DIPLOID POPULATION
    EXCOFFIER, L
    SLATKIN, M
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1995, 12 (05) : 921 - 927
  • [3] Notes on the maximum likelihood estimation of haplotype frequencies
    Mano, S
    Yasuda, N
    Katoh, T
    Tounai, K
    Inoko, H
    Imanishi, T
    Tamiya, G
    Gojobori, T
    [J]. ANNALS OF HUMAN GENETICS, 2004, 68 : 257 - 264
  • [4] Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data
    Golden, Richard M.
    Henley, Steven S.
    White, Halbert
    Kashner, T. Michael
    [J]. ECONOMETRICS, 2019, 7 (03)
  • [5] Maximum-likelihood estimation of haplotype frequencies in nuclear families
    Becker, T
    Knapp, M
    [J]. GENETIC EPIDEMIOLOGY, 2004, 27 (01) : 21 - 32
  • [6] Maximum-likelihood estimation of haplotype frequencies in trio pedigrees
    Zhang, Qiangfeng
    Xu, Yun
    Chen, Guoliang
    Che, Haoyang
    [J]. FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 35 - +
  • [7] Maximum likelihood estimation in graphical models with missing values
    Didelez, V
    Pigeot, I
    [J]. BIOMETRIKA, 1998, 85 (04) : 960 - 966
  • [8] Handling missing data when estimating causal effects with targeted maximum likelihood estimation
    Dashti, S. Ghazaleh
    Lee, Katherine J.
    Simpson, Julie A.
    White, Ian R.
    Carlin, John B.
    Moreno-Betancur, Margarita
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2024, 193 (07) : 1019 - 1030
  • [9] MISSING DATA AND MAXIMUM-LIKELIHOOD ESTIMATION
    HSIAO, C
    [J]. ECONOMICS LETTERS, 1980, 6 (03) : 249 - 253
  • [10] Handling missing data for causal effect estimation in cohort studies using Targeted Maximum Likelihood Estimation
    Dashti, Ghazaleh
    Lee, Katherine J.
    Simpson, Julie A.
    White, Ian R.
    Carlin, John B.
    Moreno-Betancur, Margarita
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2021, 50 : 55 - 55