Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

被引：16

作者：

Joo, Jong Wha J. ^{[1
]}

Kang, Eun Yong ^{[2
,3
]}

Org, Elin

Furlotte, Nick ^{[2
,3
]}

Parks, Brian

Hormozdiari, Farhad ^{[2
,3
]}

Lusis, Aldons J. ^{[4
,5
]}

Eskin, Eleazar ^{[1
,2
,3
,5
]}

机构：

[1] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA

[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

[3] Univ Calif Los Angeles, Dept Med, Los Angeles, CA 90095 USA

[4] Univ Calif Los Angeles, Dept Microbiol Immunol & Mol Genet, Los Angeles, CA 90095 USA

[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA

来源：

GENETICS | 2016年 / 204卷 / 04期

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

multivariate analysis; population structure; mixed models; GENOME-WIDE ASSOCIATION; MIXED-MODEL ANALYSIS; GENE-EXPRESSION; REGULATORY HOTSPOTS; COMPLEX TRAITS; MICE; STRATIFICATION; VARIANCE; YEAST; IDENTIFY;

D O I：

10.1534/genetics.116.189712

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.

引用

页码：1379 / 1390

页数：12

共 50 条

[21] An efficient population genetic analysis method for high-throughput sequencing data
Li, Jie
Qian, Jiating
Ding, Xi
Ling, Yayue
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 48 - 49
[22] FIFS: A data mining method for informative marker selection in high dimensional population genomic data
Kavakiotis, Ioannis
Samaras, Patroklos
Triantafyllidis, Alexandros
Vlahavas, Ioannis
COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 90 : 146 - 154
[23] Regression analysis on high-dimensional, block diagonal structure data with focus on latent variables
Seki, Shinei
Nagata, Yasushi
MATHEMATICAL METHODS AND COMPUTATIONAL TECHNIQUES IN SCIENCE AND ENGINEERING II, 2018, 1982
[24] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
Yue Yong SHI
Yu Ling JIAO
Yong Xiu CAO
Yan Yan LIU
Acta Mathematica Sinica,English Series, 2018, 34 (12) : 1892 - 1906
[25] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
Yue Yong SHI
Yu Ling JIAO
Yong Xiu CAO
Yan Yan LIU
ActaMathematicaSinica, 2018, 34 (12) : 1892 - 1906
[26] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
Yue Yong Shi
Yu Ling Jiao
Yong Xiu Cao
Yan Yan Liu
Acta Mathematica Sinica, English Series, 2018, 34 : 1892 - 1906
[27] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
Shi, Yue Yong
Jiao, Yu Ling
Cao, Yong Xiu
Liu, Yan Yan
ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2018, 34 (12) : 1892 - 1906
[28] Distributed Based Serial Regression Multiple Imputation for High Dimensional Multivariate Data in Multicore Environment of Cloud
Lavanya, K.
Reddy, L. S. S.
Reddy, B. Eswara
INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2019, 10 (02) : 63 - 79
[29] High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources
Yu, Tingting
Ye, Shangyuan
Wang, Rui
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 900 - 923
[30] An efficient content-based high-dimensional index structure for image data
Lee, JS
Yoo, JS
Lee, SH
Kim, MJ
ETRI JOURNAL, 2000, 22 (02) : 32 - 42

← 1 2 3 4 5 →