Multiset sparse partial least squares path modeling for high dimensional omics data analysis

被引:9
|
作者
Csala, Attila [1 ]
Zwinderman, Aeilko H. [1 ]
Hof, Michel H. [1 ]
机构
[1] Univ Amsterdam, Dept Clin Epidemiol Biostat & Bioinformat, NL-1105 AZ Amsterdam, Netherlands
关键词
Multivariate analysis; High dimensional omics data; Partial least squares; REDUNDANCY ANALYSIS; VARIABLE SELECTION; INTEGRATING DATA; MARFAN-SYNDROME; CENTRAL DOGMA; EXPRESSION; MULTIBLOCK; PHENOTYPE; GENE; BETA;
D O I
10.1186/s12859-019-3286-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Recent technological developments have enabled the measurement of a plethora of biomolecular data from various omics domains, and research is ongoing on statistical methods to leverage these omics data to better model and understand biological pathways and genetic architectures of complex phenotypes. Current reviews report that the simultaneous analysis of multiple (i.e. three or more) high dimensional omics data sources is still challenging and suitable statistical methods are unavailable. Often mentioned challenges are the lack of accounting for the hierarchical structure between omics domains and the difficulty of interpretation of genomewide results. This study is motivated to address these challenges. We propose multiset sparse Partial Least Squares path modeling (msPLS), a generalized penalized form of Partial Least Squares path modeling, for the simultaneous modeling of biological pathways across multiple omics domains. msPLS simultaneously models the effect of multiple molecular markers, from multiple omics domains, on the variation of multiple phenotypic variables, while accounting for the relationships between data sources, and provides sparse results. The sparsity in the model helps to provide interpretable results from analyses of hundreds of thousands of biomolecular variables. Results With simulation studies, we quantified the ability of msPLS to discover associated variables among high dimensional data sources. Furthermore, we analysed high dimensional omics datasets to explore biological pathways associated with Marfan syndrome and with Chronic Lymphocytic Leukaemia. Additionally, we compared the results of msPLS to the results of Multi-Omics Factor Analysis (MOFA), which is an alternative method to analyse this type of data. Conclusions msPLS is an multiset multivariate method for the integrative analysis of multiple high dimensional omics data sources. It accounts for the relationship between multiple high dimensional data sources while it provides interpretable results through its sparse solutions. The biomarkers found by msPLS in the omics datasets can be interpreted in terms of biological pathways associated with the pathophysiology of Marfan syndrome and of Chronic Lymphocytic Leukaemia. Additionally, msPLS outperforms MOFA in terms of variation explained in the chronic lymphocytic leukaemia dataset while it identifies the two most important clinical markers for Chronic Lymphocytic Leukaemia Availability http://uva.csala.me/mspls. https://github.com/acsala/2018_msPLS
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Multiset sparse partial least squares path modeling for high dimensional omics data analysis
    Attila Csala
    Aeilko H. Zwinderman
    Michel H. Hof
    [J]. BMC Bioinformatics, 21
  • [2] Multiset sparse redundancy analysis for high-dimensional omics data
    Csala, Attila
    Hof, Michel H.
    Zwinderman, Aeilko H.
    [J]. BIOMETRICAL JOURNAL, 2019, 61 (02) : 406 - 423
  • [3] Sparse Partial Least Squares Classification for High Dimensional Data
    Chung, Dongjun
    Keles, Sunduz
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2010, 9 (01)
  • [4] Integration of high-dimensional omics data using sparse orthogonal 2-way partial least squares
    Gu, Zhujie
    el Bouhaddani, Said
    Harakalova, Magdalena
    Houwing-Duistermaat, Jeanine J.
    Uh, Hae-Won
    [J]. GENETIC EPIDEMIOLOGY, 2020, 44 (05) : 522 - 522
  • [5] Reflections on Partial Least Squares Path Modeling
    McIntosh, Cameron N.
    Edwards, Jeffrey R.
    Antonakis, John
    [J]. ORGANIZATIONAL RESEARCH METHODS, 2014, 17 (02) : 210 - 251
  • [6] CONSISTENT PARTIAL LEAST SQUARES PATH MODELING
    Dijkstra, Theo K.
    Henseler, Jorg
    [J]. MIS QUARTERLY, 2015, 39 (02) : 297 - +
  • [7] Robust partial least squares path modeling
    Schamberger T.
    Schuberth F.
    Henseler J.
    Dijkstra T.K.
    [J]. Behaviormetrika, 2020, 47 (1) : 307 - 334
  • [8] Global Least Squares Path Modeling: A Full-Information Alternative to Partial Least Squares Path Modeling
    Hwang, Heungsun
    Cho, Gyeongcheol
    [J]. PSYCHOMETRIKA, 2020, 85 (04) : 947 - 972
  • [9] Global Least Squares Path Modeling: A Full-Information Alternative to Partial Least Squares Path Modeling
    Heungsun Hwang
    Gyeongcheol Cho
    [J]. Psychometrika, 2020, 85 : 947 - 972
  • [10] Sparse partial least-squares regression for high-throughput survival data analysis
    Lee, Donghwan
    Lee, Youngjo
    Pawitan, Yudi
    Lee, Woojoo
    [J]. STATISTICS IN MEDICINE, 2013, 32 (30) : 5340 - 5352