Evaluation of supervised machine-learning methods for predicting appearance traits from DNA

被引:11
|
作者
Katsara, Maria-Alexandra [1 ]
Branicki, Wojciech [2 ]
Walsh, Susan [3 ]
Kayser, Manfred [4 ]
Nothnagel, Michael [1 ,5 ,6 ]
机构
[1] Univ Cologne, Cologne Ctr Genom, Cologne, Germany
[2] Jagiellonian Univ, Malopolska Ctr Biotechnol, Krakow, Poland
[3] Indiana Univ Purdue Univ Indianapolis IUPUI, Dept Biol, Indianapolis, IN USA
[4] Erasmus MC Univ Med Ctr Rotterdam, Dept Genet Identificat, Rotterdam, Netherlands
[5] Fac Med, Cologne, Germany
[6] Cologne Univ Hosp, Cologne, Germany
基金
欧盟地平线“2020”;
关键词
Externally visible characteristics; Predictive DNA analysis; Appearance prediction; Genetic prediction; DNA phenotyping; Forensic DNA phenotyping; Machine learning; Classifiers; GENOME-WIDE ASSOCIATION; SKIN COLOR PREDICTION; EYE COLOR; GENETIC-DETERMINANTS; PIGMENTATION; HAIR; SYSTEM; PHENOTYPES; COMPLEX; IMPACT;
D O I
10.1016/j.fsigen.2021.102507
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. Motivated by the question to identify a potential classifier that outperforms these specific trait models, we conducted a systematic comparison between the widely used MLR and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF) and artificial neural networks (ANN), that have shown good performance outside EVC prediction. As examples, we used eye, hair and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. We compared and assessed the performances of each of the four methods, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, we observed that all four classification methods showed rather similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on our findings, none of the ML methods applied here provide any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Probabilistic and machine-learning methods for predicting local rates of transcription elongation from nascent RNA sequencing data
    Liu, Lingjie
    Zhao, Yixin
    Hassett, Rebecca
    Toneyan, Shushan
    Koo, Peter K.
    Siepel, Adam
    NUCLEIC ACIDS RESEARCH, 2025, 53 (04)
  • [22] Machine learning methods in near infrared spectroscopy for predicting sensory traits in sweetpotatoes
    Nantongo, Judith Ssali
    Serunkuma, Edwin
    Burgos, Gabriela
    Nakitto, Mariam
    Davrieux, Fabrice
    Ssali, Reuben
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2024, 318
  • [23] Supervised Machine-learning Predictive Analytics for Prediction of Postinduction Hypotension
    Kendale, Samir
    Kulkarni, Prathamesh
    Rosenberg, Andrew D.
    Wang, Jing
    ANESTHESIOLOGY, 2018, 129 (04) : 675 - 688
  • [24] Methods for Automatic Machine-Learning Workflow Analysis
    Wendlinger, Lorenz
    Berndl, Emanuel
    Granitzer, Michael
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: APPLIED DATA SCIENCE TRACK, PT V, 2021, 12979 : 52 - 67
  • [25] When Correlation Is Not Enough: Validating Populism Scores from Supervised Machine-Learning Models
    Jankowski, Michael
    Huber, Robert A.
    POLITICAL ANALYSIS, 2023, 31 (04) : 591 - 605
  • [26] Machine-Learning Methods on Noisy and Sparse Data
    Poulinakis, Konstantinos
    Drikakis, Dimitris
    Kokkinakis, Ioannis W.
    Spottswood, Stephen Michael
    MATHEMATICS, 2023, 11 (01)
  • [27] Machine-Learning Methods for Computational Science and Engineering
    Frank, Michael
    Drikakis, Dimitris
    Charissis, Vassilis
    COMPUTATION, 2020, 8 (01)
  • [28] Predicting Perovskite Performance with Multiple Machine-Learning Algorithms
    Li, Ruoyu
    Deng, Qin
    Tian, Dong
    Zhu, Daoye
    Lin, Bin
    CRYSTALS, 2021, 11 (07)
  • [29] Predicting student success with and without library instruction using supervised machine learning methods
    Harker, Karen
    Hargis, Carol
    Rowe, Jennifer
    PERFORMANCE MEASUREMENT AND METRICS, 2024,
  • [30] Predicting Credit Card Fraud using Supervised Machine Learning Methods: Comparative Analysis
    Altan, Guener
    Zafer, Metin Recep
    JOURNAL OF ECONOMIC POLICY RESEARCHES-IKTISAT POLITIKASI ARASTIRMALARI DERGISI, 2024, 11 (02): : 242 - 262