Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

被引:45
|
作者
He, Dan [1 ]
Kuhn, David [2 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res, Yorktown Hts, NY 10598 USA
[2] USDA ARS, Subtrop Hort Res Stn, 13601 Old Cutler Rd, Miami, FL 33158 USA
关键词
MARKER-ASSISTED SELECTION; GENOMIC SELECTION;
D O I
10.1093/bioinformatics/btw249
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show thatmodelingmultiple traits together could improve the prediction accuracy for correlated traits.
引用
收藏
页码:37 / 43
页数:7
相关论文
共 50 条
  • [1] A Bayesian Multiple Kernel Learning Framework for Single and Multiple Output Regression
    Goenen, Mehmet
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 354 - 359
  • [2] Calibrated Multiple-Output Quantile Regression with Representation Learning
    Feldman, Shai
    Bates, Stephen
    Romano, Yaniv
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [3] Active Output Selection Strategies for Multiple Learning Regression Models
    Prochaska, Adrian
    Pillas, Julien
    Baeker, Bernard
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 150 - 157
  • [4] Calibrated Multiple-Output Quantile Regression with Representation Learning
    Feldman, Shai
    Bates, Stephen
    Romano, Yaniv
    Journal of Machine Learning Research, 2023, 24
  • [5] Multiple trait and random regression models using linear splines for genetic evaluation of multiple breed populations
    Ribeiro, V. M. P.
    Raidan, F. S. S.
    Barbosa, A. R.
    Silva, M. V. G. B.
    Cardoso, F. F.
    Toral, F. L. B.
    JOURNAL OF DAIRY SCIENCE, 2019, 102 (01) : 464 - 475
  • [6] Multitask Learning Using Regularized Multiple Kernel Learning
    Gonen, Mehmet
    Kandemir, Melih
    Kaski, Samuel
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 500 - 509
  • [7] Machine learning regression models for prediction of multiple ionospheric parameters
    Iban, Muzaffer Can
    Senturk, Erman
    ADVANCES IN SPACE RESEARCH, 2022, 69 (03) : 1319 - 1334
  • [8] Multiple Output Regression with Latent Noise
    Gillberg, Jussi
    Marttinen, Pekka
    Pirinen, Matti
    Kangas, Antti J.
    Soininen, Pasi
    Ali, Mehreen
    Havulinna, Aki S.
    Jarvelin, Marjo-Riitta
    Ala-Korpela, Mika
    Kaski, Samuel
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [9] The Output Prediction of Municipal Solid Waste Based on Multiple Regression Analysis Method
    Zang Xiuqing
    STATISTIC APPLICATION IN SCIENTIFIC AND SOCIAL REFORMATION, 2010, : 1121 - 1126
  • [10] Multiple Regression Genetic Programming
    Arnaldo, Ignacio
    Krawiec, Krzysztof
    O'Reilly, Una-May
    GECCO'14: PROCEEDINGS OF THE 2014 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2014, : 879 - 886