Evaluation of supervised machine-learning methods for predicting appearance traits from DNA

被引:11
|
作者
Katsara, Maria-Alexandra [1 ]
Branicki, Wojciech [2 ]
Walsh, Susan [3 ]
Kayser, Manfred [4 ]
Nothnagel, Michael [1 ,5 ,6 ]
机构
[1] Univ Cologne, Cologne Ctr Genom, Cologne, Germany
[2] Jagiellonian Univ, Malopolska Ctr Biotechnol, Krakow, Poland
[3] Indiana Univ Purdue Univ Indianapolis IUPUI, Dept Biol, Indianapolis, IN USA
[4] Erasmus MC Univ Med Ctr Rotterdam, Dept Genet Identificat, Rotterdam, Netherlands
[5] Fac Med, Cologne, Germany
[6] Cologne Univ Hosp, Cologne, Germany
基金
欧盟地平线“2020”;
关键词
Externally visible characteristics; Predictive DNA analysis; Appearance prediction; Genetic prediction; DNA phenotyping; Forensic DNA phenotyping; Machine learning; Classifiers; GENOME-WIDE ASSOCIATION; SKIN COLOR PREDICTION; EYE COLOR; GENETIC-DETERMINANTS; PIGMENTATION; HAIR; SYSTEM; PHENOTYPES; COMPLEX; IMPACT;
D O I
10.1016/j.fsigen.2021.102507
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. Motivated by the question to identify a potential classifier that outperforms these specific trait models, we conducted a systematic comparison between the widely used MLR and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF) and artificial neural networks (ANN), that have shown good performance outside EVC prediction. As examples, we used eye, hair and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. We compared and assessed the performances of each of the four methods, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, we observed that all four classification methods showed rather similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on our findings, none of the ML methods applied here provide any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Who could be behind QAnon? Authorship attribution with supervised machine-learning
    Cafiero, Florian
    Camps, Jean-Baptiste
    arXiv, 2023,
  • [42] Evaluation of statistical climate reconstruction methods based on pseudoproxy experiments using linear and machine-learning methods
    Zhang, Zeguo
    Wagner, Sebastian
    Klockmann, Marlene
    Zorita, Eduardo
    CLIMATE OF THE PAST, 2022, 18 (12) : 2643 - 2668
  • [43] Supervised machine learning for predicting torsades de points
    Zhou, Yongqi
    Hua, Yanting
    Zhou, Jialu
    Wang, Jingyi
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 127 : 64 - 64
  • [44] Predicting news deserts using supervised machine learning
    Paladhi, Arijit
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2025, 8 (02):
  • [45] A supervised machine-learning approach towards geochemical predictive modelling in archaeology
    Oonk, Stijn
    Spijker, Job
    JOURNAL OF ARCHAEOLOGICAL SCIENCE, 2015, 59 : 80 - 88
  • [46] Predicting cancer using supervised machine learning: Mesothelioma
    Choudhury, Avishek
    TECHNOLOGY AND HEALTH CARE, 2021, 29 (01) : 45 - 58
  • [47] Semi-supervised machine-learning classification of materials synthesis procedures
    Huo, Haoyan
    Rong, Ziqin
    Kononova, Olga
    Sun, Wenhao
    Botari, Tiago
    He, Tanjin
    Tshitoyan, Vahe
    Ceder, Gerbrand
    NPJ COMPUTATIONAL MATERIALS, 2019, 5 (1)
  • [48] Supervised Machine-Learning Predictive Analytics for National Quality of Life Scoring
    Kaur, Maninder
    Dhalaria, Meghna
    Sharma, Pradip Kumar
    Park, Jong Hyuk
    APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [49] Predicting the Myocardial Infarction from Predictive Analytics Through Supervised Machine Learning
    Raghukumar B.S.
    Naveen B.
    Lachikarathman D.
    SN Computer Science, 4 (4)
  • [50] Insights into modelling and evaluation of thermodynamic and transport properties of refrigerants using machine-learning methods
    Noushabadi, Abolfazl Sajadi
    Lay, Ebrahim Nemati
    Dashti, Amir
    Mohammadi, Amir H.
    Chofreh, Abdoulmohammad Gholamzadeh
    Goni, Feybi Ariani
    Klemes, Jirf Jaromir
    ENERGY, 2023, 262