Scalable randomized kernel methods for multiview data integration and prediction with application to Coronavirus disease

被引:0
|
作者
Safo, Sandra E. [1 ]
Lu, Han [1 ]
机构
[1] Univ Minnesota, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
基金
美国国家卫生研究院;
关键词
data integration; high-dimensional data; kernel; multiview learning; nonlinearity; randomized Fourier features; DISCRIMINANT-ANALYSIS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There is still more to learn about the pathobiology of coronavirus disease (COVID-19) despite 4 years of the pandemic. A multiomics approach offers a comprehensive view of the disease and has the potential to yield deeper insight into the pathogenesis of the disease. Previous multiomics integrative analysis and prediction studies for COVID-19 severity and status have assumed simple relationships (ie linear relationships) between omics data and between omics and COVID-19 outcomes. However, these linear methods do not account for the inherent underlying nonlinear structure associated with these different types of data. The motivation behind this work is to model nonlinear relationships in multiomics and COVID-19 outcomes, and to determine key multidimensional molecules associated with the disease. Toward this goal, we develop scalable randomized kernel methods for jointly associating data from multiple sources or views and simultaneously predicting an outcome or classifying a unit into one of 2 or more classes. We also determine variables or groups of variables that best contribute to the relationships among the views. We use the idea that random Fourier bases can approximate shift-invariant kernel functions to construct nonlinear mappings of each view and we use these mappings and the outcome variable to learn view-independent low-dimensional representations. We demonstrate the effectiveness of the proposed methods through extensive simulations. When the proposed methods were applied to gene expression, metabolomics, proteomics, and lipidomics data pertaining to COVID-19, we identified several molecular signatures for COVID-19 status and severity. Our results agree with previous findings and suggest potential avenues for future research. Our algorithms are implemented in Pytorch and interfaced in R and available at: https://github.com/lasandrall/RandMVLearn.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Scalable randomized kernel methods for multiview data integration and prediction with application to Coronavirus disease
    Safo, Sandra E.
    Lu, Han
    BIOSTATISTICS, 2025, 26 (01)
  • [2] Deep kernel dimensionality reduction for scalable data integration
    Sokolovska, Nataliya
    Clement, Karine
    Zucker, Jean-Daniel
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 74 : 121 - 132
  • [3] Integration of clinical and microarray data with kernel methods
    Daemen, Anneleen
    Gevaert, Olivier
    De Moor, Bart
    2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 5411 - 5415
  • [4] Multi-Kernel Classification for Integration of Clinical and Imaging Data: Application to Prediction of Cognitive Decline in Older Adults
    Filipovych, Roman
    Resnick, Susan M.
    Davatzikos, Christos
    MACHINE LEARNING IN MEDICAL IMAGING, 2011, 7009 : 26 - +
  • [5] Handling missing values in kernel methods with application to microbiology data
    Belanche, Lluis A.
    Kobayashi, Vladimer
    Aluja, Tomas
    NEUROCOMPUTING, 2014, 141 : 110 - 116
  • [6] Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization
    Zhou, Doudou
    Gan, Ziming
    Shi, Xu
    Patwari, Alina
    Rush, Everett
    Bonzel, Clara-Lea
    Panickan, Vidul A.
    Hong, Chuan
    Ho, Yuk-Lam
    Cai, Tianrun
    Costa, Lauren
    Li, Xiaoou
    Castro, Victor M.
    Murphy, Shawn N.
    Brat, Gabriel
    Weber, Griffin
    Avillach, Paul
    Gaziano, J. Michael
    Cho, Kelly
    Liao, Katherine P.
    Lu, Junwei
    Cai, Tianxi
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 133
  • [7] Application of Data Mining Methods in Diabetes Prediction
    Komi, Messan
    Li, Jun
    Zhai, Yongxin
    Zhang, Xianguo
    2017 2ND INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC 2017), 2017, : 1006 - 1010
  • [8] A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice
    Jacquin, Laval
    Tuong-Vi Cao
    Ahmadi, Nourollah
    FRONTIERS IN GENETICS, 2016, 7
  • [9] Data integration for prediction of weight loss in randomized controlled dietary trials
    Rikke Linnemann Nielsen
    Marianne Helenius
    Sara L. Garcia
    Henrik M. Roager
    Derya Aytan-Aktug
    Lea Benedicte Skov Hansen
    Mads Vendelbo Lind
    Josef K. Vogt
    Marlene Danner Dalgaard
    Martin I. Bahl
    Cecilia Bang Jensen
    Rasa Muktupavela
    Christina Warinner
    Vincent Aaskov
    Rikke Gøbel
    Mette Kristensen
    Hanne Frøkiær
    Morten H. Sparholt
    Anders F. Christensen
    Henrik Vestergaard
    Torben Hansen
    Karsten Kristiansen
    Susanne Brix
    Thomas Nordahl Petersen
    Lotte Lauritzen
    Tine Rask Licht
    Oluf Pedersen
    Ramneek Gupta
    Scientific Reports, 10
  • [10] Data integration for prediction of weight loss in randomized controlled dietary trials
    Nielsen, Rikke Linnemann
    Helenius, Marianne
    Garcia, Sara L.
    Roager, Henrik M.
    Aytan-Aktug, Derya
    Hansen, Lea Benedicte Skov
    Lind, Mads Vendelbo
    Vogt, Josef K.
    Dalgaard, Marlene Danner
    Bahl, Martin, I
    Jensen, Cecilia Bang
    Muktupavela, Rasa
    Warinner, Christina
    Aaskov, Vincent
    Gobel, Rikke
    Kristensen, Mette
    Frokiaer, Hanne
    Sparholt, Morten H.
    Christensen, Anders F.
    Vestergaard, Henrik
    Hansen, Torben
    Kristiansen, Karsten
    Brix, Susanne
    Petersen, Thomas Nordahl
    Lauritzen, Lotte
    Licht, Tine Rask
    Pedersen, Oluf
    Gupta, Ramneek
    SCIENTIFIC REPORTS, 2020, 10 (01)