Using input dependent weights for model combination and model selection with multiple sources of data

被引:0
|
作者
Pan, We [1 ]
Xiao, Guanghua [1 ]
Huang, Xiaohong [1 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
关键词
classification; microarray data; model mixing; partial least squares; prediction;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.
引用
收藏
页码:523 / 540
页数:18
相关论文
共 50 条
  • [1] AIC model selection using Akaike weights
    Eric-Jan Wagenmakers
    Simon Farrell
    [J]. Psychonomic Bulletin & Review, 2004, 11 : 192 - 196
  • [2] AIC model selection using Akaike weights
    Wagenmakers, EJ
    Farrell, S
    [J]. PSYCHONOMIC BULLETIN & REVIEW, 2004, 11 (01) : 192 - 196
  • [3] Computer fire model selection and data sources
    Janssens, ML
    [J]. ASTM'S ROLE IN PERFORMANCE-BASED FIRE CODES AND STANDARDS, 1999, 1377 : 74 - 86
  • [4] Calcium Responses Model in Striatum Dependent on Timed Input Sources
    Nakano, Takashi
    Yoshimoto, Junichiro
    Wickens, Jeff
    Doya, Kenji
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 249 - +
  • [5] Uncertainty with the Gamma Test for Model Input Data Selection
    Han, Dawei
    Yan, Weizhong
    Nia, Alireza Moghaddam
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [6] Integration of Multiple Genomic Data Sources in a Bayesian Cox Model for Variable Selection and Prediction
    Treppmann, Tabea
    Ickstadt, Katja
    Zucknick, Manuela
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2017, 2017
  • [7] Blended Temperature Forecasting Model for Thailand Using Multiple Data Sources
    Jaidee, Sukrit
    Boon-Nontae, Waianchaporn
    Srithiam, Weerayut
    [J]. 2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 319 - 320
  • [8] Geospatial input data for the PALM model system 6.0: model requirements, data sources and processing
    Heldens, Wieke
    Burmeister, Cornelia
    Kanani-Suehring, Farah
    Maronga, Bjoern
    Pavlik, Dirk
    Suehring, Matthias
    Zeidler, Julian
    Esch, Thomas
    [J]. GEOSCIENTIFIC MODEL DEVELOPMENT, 2020, 13 (11) : 5833 - 5873
  • [9] Selecting Data Granularity and Model Specification Using the Scaled Power Likelihood with Multiple Weights
    Kim, Mingyung
    Bradlow, Eric T.
    Iyengar, Raghuram
    [J]. MARKETING SCIENCE, 2022, 41 (04) : 420 - 438
  • [10] Model Performance Scaling with Multiple Data Sources
    Hashimoto, Tatsunori
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139