Gaussian Mixture Model Implementation for Population Stratification Estimation from Genomics Data

被引:2
|
作者
Budiarto, Arif [1 ,2 ]
Mahesworo, Bharuno [2 ]
Hidayat, Alam Ahmad [2 ]
Nurlaila, Ika [2 ,3 ]
Pardamean, Bens [2 ,4 ]
机构
[1] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
[2] Bina Nusantara Univ, Bioinformat & Data Sci Res Ctr, Jakarta 11480, Indonesia
[3] Bina Nusantara Univ, Informat Syst Dept, BINUS Online Learning, Jakarta 11480, Indonesia
[4] Bina Nusantara Univ, Comp Sci Dept, BINUS Grad Program Master Comp Sci, Jakarta 11480, Indonesia
关键词
Population Stratification; Ancestry Estimation; Gaussian Mixture Model; Genomics; SNPs; ANCESTRY; INFERENCE;
D O I
10.1016/j.procs.2021.12.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Genomics study, as opposed to socio-anthropology, has been demonstrated as an excellent tool to picture biological relatedness and disease risk factors. To analyze the data obtained from the study, Genome-wide Association Study (GWAS) has been more than decades known as the mainstay approach., is the most popular approach in analysing genomics data. The confounding variables selection, being that ancestry estimation or population stratification, is substantial to maintain the quality of GWAS. Researchers have developed various methods in extracting the population stratification information from high dimensional genomics data, especially Single Nucleotide Polymorphisms (SNPs) data. In the present study, we proposed an implementation of Principal Component Analysis (PCA)-complemented Gaussian Mixture Model (GMM) as an unsupervised model to estimate population stratification from samples. The results derived from this approach was further compared to that resulted from K-means and from the commonly used ancestry estimation software, fast STRUCTURE. We figured out that our recent improved approach outperformed the two later mentioned as shown by the average cluster and population scores. Furthermore, it was able to generate the probability distribution of each sample across all population, despite its limited quality. These intriguing results worth further investigations with much more comprehensive population coverage and more advanced algorithm. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:202 / 210
页数:9
相关论文
共 50 条
  • [1] PARAMETER ESTIMATION OF GAUSSIAN MIXTURE MODEL UTILIZING BOUNDARY DATA
    Omachi, Masako
    Omachi, Shinichiro
    Aso, Hirotomo
    Saito, Tsuneo
    [J]. PROCEEDINGS OF THE 38TH INTERNATIONAL CONFERENCE ON COMPUTERS AND INDUSTRIAL ENGINEERING, VOLS 1-3, 2008, : 291 - 297
  • [2] Learning a Gaussian Mixture Model From Imperfect Training Data for Robust Channel Estimation
    Fesl, Benedikt
    Turan, Nurettin
    Joham, Michael
    Utschick, Wolfgang
    [J]. IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (06) : 1066 - 1070
  • [3] Mixture Gaussian process model with Gaussian mixture distribution for big data
    Guan, Yaonan
    He, Shaoying
    Ren, Shuangshuang
    Liu, Shuren
    Li, Dewei
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 253
  • [4] Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data
    Kim, Ho-Rim
    Yu, Soonyoung
    Yun, Seong-Taek
    Kim, Kyoung-Ho
    Lee, Goon-Taek
    Lee, Jeong-Ho
    Heo, Chul-Ho
    Ryu, Dong-Woo
    [J]. ECONOMIC AND ENVIRONMENTAL GEOLOGY, 2022, 55 (04): : 353 - 366
  • [5] BAYES ESTIMATION IN A MIXTURE INVERSE GAUSSIAN MODEL
    GUPTA, RC
    AKMAN, HO
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1995, 47 (03) : 493 - 503
  • [6] Bayesian estimation of the Gaussian mixture GARCH model
    Concepcion Ausin, Maria
    Galeano, Pedro
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (05) : 2636 - 2652
  • [7] Separation of geochemical anomalies from the sample data of unknown distribution population using Gaussian mixture model
    Chen, Yongliang
    Wu, Wei
    [J]. COMPUTERS & GEOSCIENCES, 2019, 125 : 9 - 18
  • [8] Gaussian mixture density estimation applied to microarray data
    Steinhoff, C
    Muller, T
    Nuber, UA
    Vingron, M
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS V, 2003, 2810 : 418 - 429
  • [9] Gaussian Mixture Estimation from Weighted Samples
    Frisch, Daniel
    Hanebeck, Uwe D.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS (MFI), 2021,
  • [10] Multivariate data clustering for the Gaussian mixture model
    Kavaliauskas, M
    Rudzkis, R
    [J]. INFORMATICA, 2005, 16 (01) : 61 - 74