Evaluation of supervised machine-learning methods for predicting appearance traits from DNA

被引:11
|
作者
Katsara, Maria-Alexandra [1 ]
Branicki, Wojciech [2 ]
Walsh, Susan [3 ]
Kayser, Manfred [4 ]
Nothnagel, Michael [1 ,5 ,6 ]
机构
[1] Univ Cologne, Cologne Ctr Genom, Cologne, Germany
[2] Jagiellonian Univ, Malopolska Ctr Biotechnol, Krakow, Poland
[3] Indiana Univ Purdue Univ Indianapolis IUPUI, Dept Biol, Indianapolis, IN USA
[4] Erasmus MC Univ Med Ctr Rotterdam, Dept Genet Identificat, Rotterdam, Netherlands
[5] Fac Med, Cologne, Germany
[6] Cologne Univ Hosp, Cologne, Germany
基金
欧盟地平线“2020”;
关键词
Externally visible characteristics; Predictive DNA analysis; Appearance prediction; Genetic prediction; DNA phenotyping; Forensic DNA phenotyping; Machine learning; Classifiers; GENOME-WIDE ASSOCIATION; SKIN COLOR PREDICTION; EYE COLOR; GENETIC-DETERMINANTS; PIGMENTATION; HAIR; SYSTEM; PHENOTYPES; COMPLEX; IMPACT;
D O I
10.1016/j.fsigen.2021.102507
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. Motivated by the question to identify a potential classifier that outperforms these specific trait models, we conducted a systematic comparison between the widely used MLR and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF) and artificial neural networks (ANN), that have shown good performance outside EVC prediction. As examples, we used eye, hair and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. We compared and assessed the performances of each of the four methods, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, we observed that all four classification methods showed rather similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on our findings, none of the ML methods applied here provide any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data
    Rajendran, Keerthana
    Jayabalan, Manoj
    Thiruchelvam, Vinesh
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (08) : 54 - 63
  • [32] A Machine-Learning Approach to Predicting Need for Hospitalization for Pediatric
    Patel, Shilpa J.
    Chamberlain, Daniel
    Chamberlain, James M.
    PEDIATRICS, 2018, 142
  • [33] Applicability of Machine-Learning Techniques in Predicting Customer Defection
    Prasasti, Niken
    Ohwada, Hayato
    2014 1ST INTERNATIONAL SYMPOSIUM ON TECHNOLOGY MANAGEMENT AND EMERGING TECHNOLOGIES (ISTMET 2014), 2014, : 157 - 162
  • [34] How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach
    Ichikawa, Daisuke
    Saito, Toki
    Ujita, Waka
    Oyama, Hiroshi
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 64 : 20 - 24
  • [35] Predicting the role of the human gut microbiome in type 1 diabetes using machine-learning methods
    Liu, Xiao-Wei
    Li, Han-Lin
    Ma, Cai-Yi
    Shi, Tian-Yu
    Wang, Tian-Yu
    Yan, Dan
    Tang, Hua
    Lin, Hao
    Deng, Ke-Jun
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2024, 23 (04) : 464 - 474
  • [36] Evaluation of supervised machine-learning algorithms to distinguish between inflammatory bowel disease and alimentary lymphoma in cats
    Awaysheh, Abdullah
    Wilcke, Jeffrey
    Elvinger, Francois
    Rees, Loren
    Fan, Weiguo
    Zimmerman, Kurt L.
    JOURNAL OF VETERINARY DIAGNOSTIC INVESTIGATION, 2016, 28 (06) : 679 - 687
  • [37] Semi-supervised machine-learning classification of materials synthesis procedures
    Haoyan Huo
    Ziqin Rong
    Olga Kononova
    Wenhao Sun
    Tiago Botari
    Tanjin He
    Vahe Tshitoyan
    Gerbrand Ceder
    npj Computational Materials, 5
  • [38] Automatic Classification of Galaxy Morphology: A Rotationally-invariant Supervised Machine-learning Method Based on the Unsupervised Machine-learning Data Set
    Fang, GuanWen
    Ba, Shuo
    Gu, Yizhou
    Lin, Zesen
    Hou, Yuejie
    Qin, Chenxin
    Zhou, Chichun
    Xu, Jun
    Dai, Yao
    Song, Jie
    Kong, Xu
    ASTRONOMICAL JOURNAL, 2023, 165 (02):
  • [39] Who could be behind QAnon? Authorship attribution with supervised machine-learning
    Cafiero, Florian
    Camps, Jean-Baptiste
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2023, 38 (04) : 1418 - 1430
  • [40] Prediction of bacterial associations with plants using a supervised machine-learning approach
    Manuel Martinez-Garcia, Pedro
    Lopez-Solanilla, Emilia
    Ramos, Cayo
    Rodriguez-Palenzuela, Pablo
    ENVIRONMENTAL MICROBIOLOGY, 2016, 18 (12) : 4847 - 4861