Scalable randomized kernel methods for multiview data integration and prediction with application to Coronavirus disease

被引:0
|
作者
Safo, Sandra E. [1 ]
Lu, Han [1 ]
机构
[1] Univ Minnesota, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
基金
美国国家卫生研究院;
关键词
data integration; high-dimensional data; kernel; multiview learning; nonlinearity; randomized Fourier features; DISCRIMINANT-ANALYSIS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There is still more to learn about the pathobiology of coronavirus disease (COVID-19) despite 4 years of the pandemic. A multiomics approach offers a comprehensive view of the disease and has the potential to yield deeper insight into the pathogenesis of the disease. Previous multiomics integrative analysis and prediction studies for COVID-19 severity and status have assumed simple relationships (ie linear relationships) between omics data and between omics and COVID-19 outcomes. However, these linear methods do not account for the inherent underlying nonlinear structure associated with these different types of data. The motivation behind this work is to model nonlinear relationships in multiomics and COVID-19 outcomes, and to determine key multidimensional molecules associated with the disease. Toward this goal, we develop scalable randomized kernel methods for jointly associating data from multiple sources or views and simultaneously predicting an outcome or classifying a unit into one of 2 or more classes. We also determine variables or groups of variables that best contribute to the relationships among the views. We use the idea that random Fourier bases can approximate shift-invariant kernel functions to construct nonlinear mappings of each view and we use these mappings and the outcome variable to learn view-independent low-dimensional representations. We demonstrate the effectiveness of the proposed methods through extensive simulations. When the proposed methods were applied to gene expression, metabolomics, proteomics, and lipidomics data pertaining to COVID-19, we identified several molecular signatures for COVID-19 status and severity. Our results agree with previous findings and suggest potential avenues for future research. Our algorithms are implemented in Pytorch and interfaced in R and available at: https://github.com/lasandrall/RandMVLearn.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] A pathway-based data integration framework for prediction of disease progression
    Seoane, Jose A.
    Day, Ian N. M.
    Gaunt, Tom R.
    Campbell, Colin
    BIOINFORMATICS, 2014, 30 (06) : 838 - 845
  • [22] Disease outbreak prediction by data integration and multi-task learning
    Bardak, Batuhan
    Tan, Mehmet
    2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2017, : 204 - 210
  • [23] Fusion of imaging and non-imaging data for disease trajectory prediction for coronavirus disease 2019 patients
    Tariq, Amara
    Tang, Siyi
    Sakhi, Hifza
    Celi, Leo Anthony
    Newsome, Janice M.
    Rubin, Daniel L.
    Trivedi, Hari
    Gichoya, Judy Wawira
    Banerjee, Imon
    JOURNAL OF MEDICAL IMAGING, 2023, 10 (03)
  • [24] Modeling and prediction of the 2019 coronavirus disease spreading in China incorporating human migration data
    Zhan, Choujun
    Tse, Chi K.
    Fu, Yuxia
    Lai, Zhikang
    Zhang, Haijun
    PLOS ONE, 2020, 15 (10):
  • [25] Prediction of disease based on prescription using data mining methods
    Dehkordi, Shiva Kazempour
    Sajedi, Hedieh
    HEALTH AND TECHNOLOGY, 2019, 9 (01) : 37 - 44
  • [26] Prediction of disease based on prescription using data mining methods
    Shiva Kazempour Dehkordi
    Hedieh Sajedi
    Health and Technology, 2019, 9 : 37 - 44
  • [27] A scalable and real-time system for disease prediction using big data processing
    Abderrahmane Ed-daoudy
    Khalil Maalmi
    Aziza El Ouaazizi
    Multimedia Tools and Applications, 2023, 82 : 30405 - 30434
  • [28] A scalable and real-time system for disease prediction using big data processing
    Ed-daoudy, Abderrahmane
    Maalmi, Khalil
    El Ouaazizi, Aziza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 30405 - 30434
  • [29] CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis
    Yongqun He
    Hong Yu
    Edison Ong
    Yang Wang
    Yingtong Liu
    Anthony Huffman
    Hsin-hui Huang
    John Beverley
    Junguk Hur
    Xiaolin Yang
    Luonan Chen
    Gilbert S. Omenn
    Brian Athey
    Barry Smith
    Scientific Data, 7
  • [30] CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis
    He, Yongqun
    Yu, Hong
    Ong, Edison
    Wang, Yang
    Liu, Yingtong
    Huffman, Anthony
    Huang, Hsin-hui
    Beverley, John
    Hur, Junguk
    Yang, Xiaolin
    Chen, Luonan
    Omenn, Gilbert S.
    Athey, Brian
    Smith, Barry
    SCIENTIFIC DATA, 2020, 7 (01)