Quantifying the amount of missing information in genetic association studies

被引:18
|
作者
Nicolae, Dan L.
机构
[1] Univ Chicago, Dept Med, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
关键词
information content; multi-locus linkage disequilibrium; asymptotic relative efficiency; association testing; case-control design;
D O I
10.1002/gepi.20181
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Many genetic analyses are done with incomplete information; for example, unknown phase in haplotype-based association studies. Measures of the amount of available information can be used for efficient planning of studies and/or analyses. In particular, the linkage disequilibrium (LD) between two sets of markers can be interpreted as the amount of information one set of markers contains for testing allele frequency differences in the second set, and measuring LD can be viewed as quantifying information in a missing data problem. We introduce a framework for measuring the association between two sets of variables; for example, genotype data for two distinct groups of markers, or haplotype and genotype data for a given set of polymorphisms. The goal is to quantify how much information is in one data set, e.g. genotype data for a set of SNPs, for estimating parameters that are functions of frequencies in the second data set, e.g. haplotype frequencies, relative to the ideal case of actually observing the complete data, e.g. haplotypes. In the case of genotype data on two mutually exclusive sets of markers, the measure determines the amount of multi-locus LD, and is equal to the classical measure r(2), if the sets consist each of one bi-allelic marker. In general, the measures are interpreted as the asymptotic ratio of sample sizes necessary to achieve the same power in case-control testing. The focus of this paper is on case-control allele/haplotype tests, but the framework can be extended easily to other settings like regressing quantitative traits on allele/haplotype counts, or tests on genotypes or diplotypes. We highlight applications of the approach, including tools for navigating the HapMap database [The International HapMap Consortium, 2003], and genotyping strategies for positional cloning studies. Genet. Epidemiol. 30:703-717, 2006. (c) 2006 Wiley-Liss, Inc.
引用
收藏
页码:703 / 717
页数:15
相关论文
共 50 条
  • [21] Quantifying the Extent to Which Index Event Biases Influence Large Genetic Association Studies
    Yaghootkar, Hanieh
    Bancks, Michael P.
    Jones, Sam E.
    McDaid, Aaron F.
    Beaumont, Robin
    Donnelly, Louise
    Wood, Andrew R.
    Campbell, Archie
    Tyrrell, Jessica
    Hocking, Lynne J.
    Tuke, Marcus A.
    Ruth, Katherine S.
    Pearson, Ewan R.
    Murray, Anna
    Freathy, Rachel M.
    Munroe, Patricia B.
    Hayward, Caroline
    Palmer, Colin
    Weedon, Michael N.
    Pankow, James S.
    Frayling, Timothy M.
    Kutalik, Zoltan
    HUMAN HEREDITY, 2016, 81 (04) : 214 - 214
  • [22] The Missing Diversity in Human Genetic Studies
    Sirugo, Giorgio
    Williams, Scott M.
    Tishkoff, Sarah A.
    CELL, 2019, 177 (01) : 26 - 31
  • [23] Effect of missing sire information on genetic evaluation
    Harder, B
    Bennewitz, J
    Reinsch, N
    Mayer, M
    Kalm, E
    ARCHIV FUR TIERZUCHT-ARCHIVES OF ANIMAL BREEDING, 2005, 48 (03): : 219 - 232
  • [24] QUANTIFYING THE AMOUNT OF VERBOSENESS
    BEIGEL, R
    KUMMER, M
    STEPHAN, F
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 620 : 21 - 32
  • [25] QUANTIFYING THE AMOUNT OF VERBOSENESS
    BEIGEL, R
    KUMMER, M
    STEPHAN, F
    INFORMATION AND COMPUTATION, 1995, 118 (01) : 73 - 90
  • [26] A method to incorporate prior information into score test for genetic association studies
    Zakharov, Sergii
    Teoh, Garrett H. K.
    Salim, Agus
    Thalamuthu, Anbupalam
    BMC BIOINFORMATICS, 2014, 15
  • [27] A method to incorporate prior information into score test for genetic association studies
    Sergii Zakharov
    Garrett HK Teoh
    Agus Salim
    Anbupalam Thalamuthu
    BMC Bioinformatics, 15
  • [28] A likelihood approach to family-based association studies with ordinal responses and missing parental information
    Moungmai, Rungruttikarn
    Baksh, Fazil
    ANNALS OF HUMAN GENETICS, 2012, 76 : 418 - 418
  • [29] Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity
    Spinka, C
    Carroll, RJ
    Chatterjee, N
    GENETIC EPIDEMIOLOGY, 2005, 29 (02) : 108 - 127