Identification of disease-associated loci using machine learning for genotype and network data integration

被引:7
|
作者
Leal, Luis G. [1 ]
David, Alessia [1 ]
Jarvelin, Marjo-Riita [2 ,3 ,4 ,5 ,6 ]
Sebert, Sylvain [2 ,3 ]
Mannikko, Minna [2 ]
Karhunen, Ville [2 ,3 ,4 ,5 ,6 ]
Seaby, Eleanor [7 ]
Hoggart, Clive [8 ]
Sternberg, Michael J. E. [1 ]
机构
[1] Imperial Coll London, Dept Life Sci, Ctr Integrat Syst Biol & Bioinformat, London SW7 2AZ, England
[2] Univ Oulu, Fac Med, Ctr Life Course Hlth Res, FI-90014 Oulu, Finland
[3] Univ Oulu, Bioctr Oulu, SF-90220 Oulu, Finland
[4] Oulu Univ Hosp, Unit Primary Hlth Care, Oulu 90220, Finland
[5] Imperial Coll London, Sch Publ Hlth, Dept Epidemiol & Biostat, MRC PHE Ctr Environm & Hlth, London W2 1PG, England
[6] Brunel Univ London, Dept Life Sci, Coll Hlth & Life Sci, Uxbridge UB8 3PH, Middx, England
[7] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02142 USA
[8] Imperial Coll London, Dept Med, London W2 1PG, England
基金
美国国家卫生研究院; 欧盟地平线“2020”; 芬兰科学院; 英国惠康基金; 英国医学研究理事会;
关键词
RISK;
D O I
10.1093/bioinformatics/btz310
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the interrelatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals' ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user's research needs.
引用
收藏
页码:5182 / 5190
页数:9
相关论文
共 50 条
  • [21] Machine Learning Analysis of Inflammatory Bowel Disease-Associated Metagenomics Dataset
    Hacilar, Hilal
    Nalbantoglu, O. Ufuk
    Bakir-Gungor, Burcu
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 434 - 438
  • [22] Machine learning based disease prediction from genotype data
    Katsaouni, Nikoletta
    Tashkandi, Araek
    Wiese, Lena
    Schulz, Marcel H.
    [J]. BIOLOGICAL CHEMISTRY, 2021, 402 (08) : 871 - 885
  • [23] Integration of Population-Specific Novel and Established Genetic Variation for the Identification of Crohn's Disease-Associated Loci in Ashkenazi Jewish Individuals
    Hui, Ken
    Zhang, Wei
    Gusev, Alexander
    Pe'er, Itsik
    Peter, Inga
    Cho, Judy H.
    [J]. GASTROENTEROLOGY, 2012, 142 (05) : S873 - S873
  • [24] Combined view of disease-associated loci with expression data of a mouse model for rheumatoid arthritis
    Möller, S
    Serrano-Fernandez, P
    Fischer, G
    Kreutzer, M
    Koczan, D
    Ibrahim, SM
    Thiesen, HJ
    [J]. PROCEEDINGS OF THE XII INTERNATIONAL CONGRESS ON GENES, GENE FAMILIES, AND ISOZYMES, 2003, : 41 - 46
  • [25] Genetic diversity of disease-associated loci in Turkish population
    Sefayet Karaca
    Tomris Cesuroglu
    Mehmet Karaca
    Sema Erge
    Renato Polimanti
    [J]. Journal of Human Genetics, 2015, 60 : 193 - 198
  • [26] Identification of disease-associated SNP clusters using a scan statistic
    Sun, YV
    Meyers, KJ
    Mosley, TH
    Boerwinkle, E
    Kullo, IJ
    Turner, ST
    Kardia, SLR
    [J]. GENETIC EPIDEMIOLOGY, 2005, 29 (03) : 278 - 279
  • [27] Genetic diversity of disease-associated loci in Turkish population
    Karaca, Sefayet
    Cesuroglu, Tomris
    Karaca, Mehmet
    Erge, Sema
    Polimanti, Renato
    [J]. JOURNAL OF HUMAN GENETICS, 2015, 60 (04) : 193 - 198
  • [28] Multi-omic integration of microbiome data for identifying disease-associated modules
    Efrat Muller
    Itamar Shiryan
    Elhanan Borenstein
    [J]. Nature Communications, 15
  • [29] Multi-omic integration of microbiome data for identifying disease-associated modules
    Muller, Efrat
    Shiryan, Itamar
    Borenstein, Elhanan
    [J]. NATURE COMMUNICATIONS, 2024, 15 (01)
  • [30] Identification of a glioma functional network from gene fitness data using machine learning
    Xiang, Chun-xiang
    Liu, Xi-guo
    Zhou, Da-quan
    Zhou, Yi
    Wang, Xu
    Chen, Feng
    [J]. JOURNAL OF CELLULAR AND MOLECULAR MEDICINE, 2022, 26 (04) : 1253 - 1263