Multiset sparse redundancy analysis for high-dimensional omics data

被引:5
|
作者
Csala, Attila [1 ]
Hof, Michel H. [1 ]
Zwinderman, Aeilko H. [1 ]
机构
[1] Acad Med Ctr, Dept Clin Epidemiol Biostat & Bioinformat, NL-1105 AZ Amsterdam, Netherlands
关键词
high-dimensional data; multivariate statistics; omics data; redundancy analysis;
D O I
10.1002/bimj.201700248
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Redundancy Analysis (RDA) is a well-known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high-dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi-sRDA) framework is a prominent candidate for high-dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi-sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.
引用
收藏
页码:406 / 423
页数:18
相关论文
共 50 条
  • [1] Sparse redundancy analysis of high-dimensional genetic and genomic data
    Csala, Attila
    Voorbraak, Frans P. J. M.
    Zwinderman, Aeilko H.
    Hof, Michel H.
    [J]. BIOINFORMATICS, 2017, 33 (20) : 3228 - 3234
  • [2] Multiset sparse partial least squares path modeling for high dimensional omics data analysis
    Attila Csala
    Aeilko H. Zwinderman
    Michel H. Hof
    [J]. BMC Bioinformatics, 21
  • [3] Multiset sparse partial least squares path modeling for high dimensional omics data analysis
    Csala, Attila
    Zwinderman, Aeilko H.
    Hof, Michel H.
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [4] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    [J]. NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [5] Sparse meta-analysis with high-dimensional data
    He, Qianchuan
    Zhang, Hao Helen
    Avery, Christy L.
    Lin, D. Y.
    [J]. BIOSTATISTICS, 2016, 17 (02) : 205 - 220
  • [6] Statistical quality control analysis of high-dimensional omics data
    Kim, Yongkang
    Kim, Gyu-Tae
    Kwon, Min-Seok
    Park, Taesung
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 18 (03) : 210 - 222
  • [7] On the anonymization of sparse high-dimensional data
    Ghinita, Gabriel
    Tao, Yufei
    Kalnis, Panos
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
  • [8] Interpolation of sparse high-dimensional data
    Thomas C. H. Lux
    Layne T. Watson
    Tyler H. Chang
    Yili Hong
    Kirk Cameron
    [J]. Numerical Algorithms, 2021, 88 : 281 - 313
  • [9] Interpolation of sparse high-dimensional data
    Lux, Thomas C. H.
    Watson, Layne T.
    Chang, Tyler H.
    Hong, Yili
    Cameron, Kirk
    [J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
  • [10] Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data
    Dousti Mousavi, Niloufar
    Aldirawi, Hani
    Yang, Jie
    [J]. BIOTECH, 2023, 12 (03):