Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

被引：16

作者：

Joo, Jong Wha J. ^{[1
]}

Kang, Eun Yong ^{[2
,3
]}

Org, Elin

Furlotte, Nick ^{[2
,3
]}

Parks, Brian

Hormozdiari, Farhad ^{[2
,3
]}

Lusis, Aldons J. ^{[4
,5
]}

Eskin, Eleazar ^{[1
,2
,3
,5
]}

机构：

[1] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA

[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

[3] Univ Calif Los Angeles, Dept Med, Los Angeles, CA 90095 USA

[4] Univ Calif Los Angeles, Dept Microbiol Immunol & Mol Genet, Los Angeles, CA 90095 USA

[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA

来源：

GENETICS | 2016年 / 204卷 / 04期

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

multivariate analysis; population structure; mixed models; GENOME-WIDE ASSOCIATION; MIXED-MODEL ANALYSIS; GENE-EXPRESSION; REGULATORY HOTSPOTS; COMPLEX TRAITS; MICE; STRATIFICATION; VARIANCE; YEAST; IDENTIFY;

D O I：

10.1534/genetics.116.189712

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.

引用

页码：1379 / 1390

页数：12

共 50 条

[1] Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure
Joo, Jong Wha J.
Kang, Eun Yong
Org, Elin
Furlotte, Nick
Parks, Brian
Lusis, Aldons J.
Eskin, Eleazar
RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY (RECOMB 2015), 2015, 9029 : 136 - 153
[2] A Fully Automated Parallel-Processing R Package for High-Dimensional Multiple-Phenotype Analysis Considering Population Structure
Lee, Gi Ju
Park, Sung Min
Jung, Junghyun
Joo, Jong Wha J.
INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2020, 20 (03) : 219 - 226
[3] Weighted multiple blockwise imputation method for high-dimensional regression with blockwise missing data
Li, Jingmao
Zhang, Qingzhao
Chen, Song
Fang, Kuangnan
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (03) : 459 - 474
[4] An efficient multiple kernel computation method for regression analysis of economic data
Zhang, Xiangrong
Hu, Longying
Zhang, Lin
NEUROCOMPUTING, 2013, 118 : 58 - 64
[5] An efficient index structure for high dimensional image data
Yoo, JS
Shin, MK
Lee, SH
Choi, KS
Cho, KH
Hur, DY
ADVANCED MULTIMEDIA CONTENT PROCESSING, 1999, 1554 : 131 - 144
[6] An efficient clustering method of data mining for high-dimensional data
Chang, JW
Kang, HM
8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
[7] An efficient clustering method for high-dimensional data mining
Chang, JW
Kim, YK
ADVANCES IN ARTIFICIAL INTELLIGENCE - SBIA 2004, 2004, 3171 : 276 - 285
[8] GRID: A VARIABLE SELECTION AND STRUCTURE DISCOVERY METHOD FOR HIGH DIMENSIONAL NONPARAMETRIC REGRESSION
Giordano, Francesco
Lahiri, Soumendra Nath
Parrella, Maria Lucia
ANNALS OF STATISTICS, 2020, 48 (03): : 1848 - 1874
[9] Multivariate linear regression of high-dimensional fMRI data with multiple target variables
Valente, Giancarlo
Castellanos, Agustin Lage
Vanacore, Gianluca
Formisano, Elia
HUMAN BRAIN MAPPING, 2014, 35 (05) : 2163 - 2177
[10] AN EFFICIENT METHOD FOR COLLECTING POPULATION HEALTH DATA ACROSS MULTIPLE HEALTH SYSTEMS
Goyden, Jacob A.
Liu, Rujia
Lewis, Steven
Sudano, Joseph J.
Kaelber, David
JOURNAL OF GENERAL INTERNAL MEDICINE, 2018, 33 : S110 - S111

← 1 2 3 4 5 →