Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

被引:45
|
作者
He, Dan [1 ]
Kuhn, David [2 ]
Parida, Laxmi [1 ]
机构
[1] IBM TJ Watson Res, Yorktown Hts, NY 10598 USA
[2] USDA ARS, Subtrop Hort Res Stn, 13601 Old Cutler Rd, Miami, FL 33158 USA
关键词
MARKER-ASSISTED SELECTION; GENOMIC SELECTION;
D O I
10.1093/bioinformatics/btw249
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show thatmodelingmultiple traits together could improve the prediction accuracy for correlated traits.
引用
收藏
页码:37 / 43
页数:7
相关论文
共 50 条
  • [31] MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities
    Armitage, Jason
    Kacupaj, Endri
    Tahmasebzadeh, Golsa
    Swati
    Maleshkova, Maria
    Ewerth, Ralph
    Lehmann, Jens
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2967 - 2974
  • [32] Efficient Multitask Multiple Kernel Learning With Application to Cancer Research
    Rahimi, Arezou
    Gonen, Mehmet
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 8716 - 8728
  • [34] A Unifying Framework for Typical Multitask Multiple Kernel Learning Problems
    Li, Cong
    Georgiopoulos, Michael
    Anagnostopoulos, Georgios C.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (07) : 1287 - 1297
  • [35] On applications of semiparametric Multiple Index regression
    Kim, Eun Jung
    EKC2008: PROCEEDINGS OF THE EU-KOREA CONFERENCE ON SCIENCE AND TECHNOLOGY, 2008, 124 : 455 - 462
  • [36] Drug-target Interaction Prediction via Multiple Output Deep Learning
    Ye, Qing
    Zhang, Xiaolong
    Lin, Xiaoli
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 507 - 510
  • [37] Novel Algorithm for Multiple Quantitative Trait Loci Mapping by Using Bayesian Variable Selection Regression
    Yuan, Lin
    Han, Kyungsook
    Huang, De-Shuang
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 862 - 868
  • [38] The multiple sclerosis trait and the development of multiple sclerosis: Genetic vulnerability and environmental effect
    Poser, CM
    CLINICAL NEUROLOGY AND NEUROSURGERY, 2006, 108 (03) : 227 - 233
  • [39] Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
    Lin, Dongdong
    Zhang, Jigang
    Li, Jingyao
    He, Hao
    Deng, Hong-Wen
    Wang, Yu-Ping
    FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2014, 2
  • [40] Regression Learning with Multiple Noisy Oracles
    Ristovski, Kosta
    Das, Debasish
    Ouzienko, Vladimir
    Guo, Yuhong
    Obradovic, Zoran
    ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 445 - 450