Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

被引:16
|
作者
Joo, Jong Wha J. [1 ]
Kang, Eun Yong [2 ,3 ]
Org, Elin
Furlotte, Nick [2 ,3 ]
Parks, Brian
Hormozdiari, Farhad [2 ,3 ]
Lusis, Aldons J. [4 ,5 ]
Eskin, Eleazar [1 ,2 ,3 ,5 ]
机构
[1] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Med, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Microbiol Immunol & Mol Genet, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
multivariate analysis; population structure; mixed models; GENOME-WIDE ASSOCIATION; MIXED-MODEL ANALYSIS; GENE-EXPRESSION; REGULATORY HOTSPOTS; COMPLEX TRAITS; MICE; STRATIFICATION; VARIANCE; YEAST; IDENTIFY;
D O I
10.1534/genetics.116.189712
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.
引用
收藏
页码:1379 / 1390
页数:12
相关论文
共 50 条
  • [1] Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure
    Joo, Jong Wha J.
    Kang, Eun Yong
    Org, Elin
    Furlotte, Nick
    Parks, Brian
    Lusis, Aldons J.
    Eskin, Eleazar
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY (RECOMB 2015), 2015, 9029 : 136 - 153
  • [2] A Fully Automated Parallel-Processing R Package for High-Dimensional Multiple-Phenotype Analysis Considering Population Structure
    Lee, Gi Ju
    Park, Sung Min
    Jung, Junghyun
    Joo, Jong Wha J.
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2020, 20 (03) : 219 - 226
  • [3] Weighted multiple blockwise imputation method for high-dimensional regression with blockwise missing data
    Li, Jingmao
    Zhang, Qingzhao
    Chen, Song
    Fang, Kuangnan
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (03) : 459 - 474
  • [4] An efficient multiple kernel computation method for regression analysis of economic data
    Zhang, Xiangrong
    Hu, Longying
    Zhang, Lin
    NEUROCOMPUTING, 2013, 118 : 58 - 64
  • [5] An efficient index structure for high dimensional image data
    Yoo, JS
    Shin, MK
    Lee, SH
    Choi, KS
    Cho, KH
    Hur, DY
    ADVANCED MULTIMEDIA CONTENT PROCESSING, 1999, 1554 : 131 - 144
  • [6] An efficient clustering method of data mining for high-dimensional data
    Chang, JW
    Kang, HM
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
  • [7] An efficient clustering method for high-dimensional data mining
    Chang, JW
    Kim, YK
    ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004, 2004, 3171 : 276 - 285
  • [8] GRID: A VARIABLE SELECTION AND STRUCTURE DISCOVERY METHOD FOR HIGH DIMENSIONAL NONPARAMETRIC REGRESSION
    Giordano, Francesco
    Lahiri, Soumendra Nath
    Parrella, Maria Lucia
    ANNALS OF STATISTICS, 2020, 48 (03): : 1848 - 1874
  • [9] Multivariate linear regression of high-dimensional fMRI data with multiple target variables
    Valente, Giancarlo
    Castellanos, Agustin Lage
    Vanacore, Gianluca
    Formisano, Elia
    HUMAN BRAIN MAPPING, 2014, 35 (05) : 2163 - 2177
  • [10] AN EFFICIENT METHOD FOR COLLECTING POPULATION HEALTH DATA ACROSS MULTIPLE HEALTH SYSTEMS
    Goyden, Jacob A.
    Liu, Rujia
    Lewis, Steven
    Sudano, Joseph J.
    Kaelber, David
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2018, 33 : S110 - S111