Handling missing values in population data:: consequences for maximum likelihood estimation of haplotype frequencies

被引：24

作者：

Gourraud, PA

Génin, E

Cambon-Thomsen, A

机构：

[1] Fac Med Toulouse, INSERM, U558, F-31073 Toulouse, France

[2] Hop Paul Brousse, INSERM, U535, F-94817 Villejuif, France

来源：

EUROPEAN JOURNAL OF HUMAN GENETICS | 2004年 / 12卷 / 10期

关键词：

EM algorithm; HLA; haplotype; missing values; bioinformatics; linkage disequilibrium;

D O I：

10.1038/sj.ejhg.5201233

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Haplotype frequency estimation in population data is an important problem in genetics and different methods including expectation maximisation (EM) methods have been proposed. The statistical properties of EM methods have been extensively assessed for data sets with no missing values. When numerous markers and/or individuals are tested, however, it is likely that some genotypes will be missing. Thus, it is of interest to investigate the behaviour of the method in the presence of incomplete genotype observations. We propose an extension of the EM method to handle missing genotypes, and we compare it with commonly used methods (such as ignoring individuals with incomplete genotype information or treating a missing allele as any other allele). Simulations were performed, starting from data sets of haematopoietic stem cell donors genotyped at three HLA loci. We deleted some data to create incomplete genotype observations in various proportions. We then compared the haplotype frequencies obtained on these incomplete data sets using the different methods to those obtained on the complete data. We found that the method proposed here provides better estimations, both qualitatively and quantitatively, but increases the computation time required. We discuss the influence of missing values on the algorithm's efficiency and the advantages and disadvantages of deleting incomplete genotypes. We propose guidelines for missing data handling in routine analysis.

引用

页码：805 / 812

页数：8

共 50 条

[21] Quasi-maximum likelihood estimation of GARCH models in the presence of missing values
Cascone, Marcos H.
Hotta, Luiz K.
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (02) : 292 - 314
[22] MAXIMUM-LIKELIHOOD-ESTIMATION AND LIKELIHOOD RATIO TEST FOR SQUARE TABLES WITH MISSING DATA
SHIH, WJ
[J]. STATISTICS IN MEDICINE, 1987, 6 (01) : 91 - 97
[23] Full maximum likelihood estimation of polychoric and polyserial correlations with missing data
Song, XY
Lee, SY
[J]. MULTIVARIATE BEHAVIORAL RESEARCH, 2003, 38 (01) : 57 - 79
[24] Marginal maximum likelihood estimation of conditional autoregressive models with missing data
Suesse, Thomas
Zammit-Mangion, Andrew
[J]. STAT, 2019, 8 (01):
[25] A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation
Albayrak, Muammer
Turhan, Kemal
Kurt, Burcin
[J]. 2017 MEDICAL TECHNOLOGIES NATIONAL CONGRESS (TIPTEKNO), 2017,
[26] MAXIMUM LIKELIHOOD METHOD FOR ESTIMATION OF GENE FREQUENCIES FROM MNS DATA
BOYD, WC
[J]. AMERICAN JOURNAL OF HUMAN GENETICS, 1954, 6 (01) : 1 - 10
[27] Maximum likelihood estimation of linear SISO models subject to missing output data and missing input data
Wallin, Ragnar
Hansson, Anders
[J]. INTERNATIONAL JOURNAL OF CONTROL, 2014, 87 (11) : 2354 - 2364
[28] Full information maximum likelihood estimation in factor analysis with a large number of missing values
Hirose, Kei
Kim, Sunyong
Kano, Yutaka
Imada, Miyuki
Yoshida, Manabu
Matsuo, Masato
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2016, 86 (01) : 91 - 104
[29] GENERAL PROGRAM FOR ESTIMATION OF HAPLOTYPE FREQUENCIES FROM POPULATION DIPLOID DATA
LARSEN, SO
[J]. COMPUTER PROGRAMS IN BIOMEDICINE, 1979, 10 (01): : 48 - 54
[30] MAXIMUM-LIKELIHOOD-ESTIMATION FOR CONSTRAINED-DATA OR MISSING-DATA MODELS
GELFAND, AE
CARLIN, BP
[J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1993, 21 (03): : 303 - 311

← 1 2 3 4 5 →