Scalable randomized kernel methods for multiview data integration and prediction with application to Coronavirus disease

被引:0
|
作者
Safo, Sandra E. [1 ]
Lu, Han [1 ]
机构
[1] Univ Minnesota, Div Biostat & Hlth Data Sci, 2221 Univ Ave SE, Minneapolis, MN 55414 USA
基金
美国国家卫生研究院;
关键词
data integration; high-dimensional data; kernel; multiview learning; nonlinearity; randomized Fourier features; DISCRIMINANT-ANALYSIS;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There is still more to learn about the pathobiology of coronavirus disease (COVID-19) despite 4 years of the pandemic. A multiomics approach offers a comprehensive view of the disease and has the potential to yield deeper insight into the pathogenesis of the disease. Previous multiomics integrative analysis and prediction studies for COVID-19 severity and status have assumed simple relationships (ie linear relationships) between omics data and between omics and COVID-19 outcomes. However, these linear methods do not account for the inherent underlying nonlinear structure associated with these different types of data. The motivation behind this work is to model nonlinear relationships in multiomics and COVID-19 outcomes, and to determine key multidimensional molecules associated with the disease. Toward this goal, we develop scalable randomized kernel methods for jointly associating data from multiple sources or views and simultaneously predicting an outcome or classifying a unit into one of 2 or more classes. We also determine variables or groups of variables that best contribute to the relationships among the views. We use the idea that random Fourier bases can approximate shift-invariant kernel functions to construct nonlinear mappings of each view and we use these mappings and the outcome variable to learn view-independent low-dimensional representations. We demonstrate the effectiveness of the proposed methods through extensive simulations. When the proposed methods were applied to gene expression, metabolomics, proteomics, and lipidomics data pertaining to COVID-19, we identified several molecular signatures for COVID-19 status and severity. Our results agree with previous findings and suggest potential avenues for future research. Our algorithms are implemented in Pytorch and interfaced in R and available at: https://github.com/lasandrall/RandMVLearn.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction
    Re, Matteo
    Valentini, Giorgio
    PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN SYSTEMS BIOLOGY, 2010, 8 : 98 - 111
  • [32] Methods of comprehensive geophysical data for prediction of porosity and analysis of its application
    Wang, Yonggang
    Yue, Youxi
    Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2001, 36 (06):
  • [33] Methods for integration of remote sensing data and crop model and their prospects in agricultural application
    National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
    Nongye Gongcheng Xuebao, 2008, 11 (295-301):
  • [34] Deep Learning-Based Multiomics Data Integration Methods for Biomedical Application
    Wen, Yuqi
    Zheng, Linyi
    Leng, Dongjin
    Dai, Chong
    Lu, Jing
    Zhang, Zhongnan
    He, Song
    Bo, Xiaochen
    ADVANCED INTELLIGENT SYSTEMS, 2023, 5 (05)
  • [35] Integration of connectionist methods and chaotic time-series analysis for the prediction of process data
    Kozma, R
    Kasabov, NK
    Kim, JS
    Cohen, A
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1998, 13 (06) : 519 - 538
  • [36] Application of harmony search algorithm in optimizing autoregressive A on a data set of Coronavirus Disease 2019
    Karmakar, Riya
    Chatterjee, Sandip
    Datta, Debabrata
    Chakraborty, Dipankar
    SYSTEMS AND SOFT COMPUTING, 2024, 6
  • [37] Data Integration Using Tensor Decomposition for the Prediction of miRNA-Disease Associations
    Luo, JiaWei
    Liu, Yi
    Liu, Pei
    Lai, Zihan
    Wu, Hao
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (05) : 2370 - 2378
  • [38] A Survey on an Analysis of Big Data Open Source Datasets, Techniques and Tools for the Prediction of Coronavirus Disease
    Rayan, R. Ame
    Suruliandi, A.
    Raja, S. P.
    David, H. Benjamin Fredrick
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (12)
  • [39] Data resources and computational methods for lncRNA-disease association prediction
    Sheng, Nan
    Huang, Lan
    Lu, Yuting
    Wang, Hao
    Yang, Lili
    Gao, Ling
    Xie, Xuping
    Fu, Yuan
    Wang, Yan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 153
  • [40] Prediction of mortality in patients with cardiovascular disease using data mining methods
    Imamovic, Damir
    Babovic, Elmir
    Bijedic, Nina
    2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,