Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure

被引:16
|
作者
Joo, Jong Wha J. [1 ]
Kang, Eun Yong [2 ,3 ]
Org, Elin
Furlotte, Nick [2 ,3 ]
Parks, Brian
Hormozdiari, Farhad [2 ,3 ]
Lusis, Aldons J. [4 ,5 ]
Eskin, Eleazar [1 ,2 ,3 ,5 ]
机构
[1] Univ Calif Los Angeles, Bioinformat Interdept PhD Program, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Med, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Microbiol Immunol & Mol Genet, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
multivariate analysis; population structure; mixed models; GENOME-WIDE ASSOCIATION; MIXED-MODEL ANALYSIS; GENE-EXPRESSION; REGULATORY HOTSPOTS; COMPLEX TRAITS; MICE; STRATIFICATION; VARIANCE; YEAST; IDENTIFY;
D O I
10.1534/genetics.116.189712
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.
引用
收藏
页码:1379 / 1390
页数:12
相关论文
共 50 条
  • [21] An efficient population genetic analysis method for high-throughput sequencing data
    Li, Jie
    Qian, Jiating
    Ding, Xi
    Ling, Yayue
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 48 - 49
  • [22] FIFS: A data mining method for informative marker selection in high dimensional population genomic data
    Kavakiotis, Ioannis
    Samaras, Patroklos
    Triantafyllidis, Alexandros
    Vlahavas, Ioannis
    COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 90 : 146 - 154
  • [23] Regression analysis on high-dimensional, block diagonal structure data with focus on latent variables
    Seki, Shinei
    Nagata, Yasushi
    MATHEMATICAL METHODS AND COMPUTATIONAL TECHNIQUES IN SCIENCE AND ENGINEERING II, 2018, 1982
  • [24] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
    Yue Yong SHI
    Yu Ling JIAO
    Yong Xiu CAO
    Yan Yan LIU
    Acta Mathematica Sinica,English Series, 2018, 34 (12) : 1892 - 1906
  • [25] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
    Yue Yong SHI
    Yu Ling JIAO
    Yong Xiu CAO
    Yan Yan LIU
    ActaMathematicaSinica, 2018, 34 (12) : 1892 - 1906
  • [26] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
    Yue Yong Shi
    Yu Ling Jiao
    Yong Xiu Cao
    Yan Yan Liu
    Acta Mathematica Sinica, English Series, 2018, 34 : 1892 - 1906
  • [27] An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
    Shi, Yue Yong
    Jiao, Yu Ling
    Cao, Yong Xiu
    Liu, Yan Yan
    ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2018, 34 (12) : 1892 - 1906
  • [28] Distributed Based Serial Regression Multiple Imputation for High Dimensional Multivariate Data in Multicore Environment of Cloud
    Lavanya, K.
    Reddy, L. S. S.
    Reddy, B. Eswara
    INTERNATIONAL JOURNAL OF AMBIENT COMPUTING AND INTELLIGENCE, 2019, 10 (02) : 63 - 79
  • [29] High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources
    Yu, Tingting
    Ye, Shangyuan
    Wang, Rui
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (03): : 900 - 923
  • [30] An efficient content-based high-dimensional index structure for image data
    Lee, JS
    Yoo, JS
    Lee, SH
    Kim, MJ
    ETRI JOURNAL, 2000, 22 (02) : 32 - 42