Big data in multi-block data analysis: An approach to parallelizing Partial Least Squares Mode B algorithm

被引:2
|
作者
Martinez-Ruiz, Alba [1 ]
Montanola-Sales, Cristina [2 ,3 ]
机构
[1] Univ Catolica Santisima Concepcion, Alonso Ribera 2850, Concepcion, Chile
[2] URL, IQS, Via Augusta,390, Barcelona 08017, Spain
[3] CNS, BSC, Jordi Girona 29, Barcelona 08034, Spain
关键词
Computer science; Computational mathematics; VARIABLES;
D O I
10.1016/j.heliyon.2019.e01451
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Partial Least Squares (PLS) Mode B is a multi-block method and a tightly coupled algorithm for estimating structural equation models (SEMs). Describing key aspects of parallel computing, we approach the parallelization of the PLS Mode B algorithm to operate on large distributed data. We show the scalability and performance of the algorithm at a very fine-grained level thanks to the versatility of pbdR, a R-project library for parallel computing. We vary several factors under different data distribution schemes in a supercomputing environment. Shorter elapsed times are obtained for the square-blocking factor 16 x 16 using a grid of processors as square as possible and non-square blocking factors 1000 x 4 and 10000 x 4 using an one-column grid of processors. Depending on the configuration, distributing data in a larger number of cores allows reaching speedups of up to 121 over the CPU implementation. Moreover, we show that SEMs can be estimated with big data sets using current state-of-the-art algorithms for multi-block data analysis.
引用
收藏
页数:29
相关论文
共 50 条
  • [11] Discriminant partial least squares analysis on compositional data
    Gallo, Michele
    STATISTICAL MODELLING, 2010, 10 (01) : 41 - 56
  • [12] ANALYSIS OF MIXTURE DATA WITH PARTIAL LEAST-SQUARES
    KETTANEHWOLD, N
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1992, 14 (1-3) : 57 - 69
  • [13] Penalized Partial Least Squares for Multi-label Data
    Liu, Huawen
    Ma, Zongjie
    Zhao, Jianmin
    Zheng, Zhonglong
    2014 PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2014), 2014, : 515 - 520
  • [14] Multi-block ADMM for big data optimization in modern communication networks
    Liu, Lanchao
    Han, Zhu
    Journal of Communications, 2015, 10 (09): : 666 - 676
  • [15] A statistical method for massive data based on partial least squares algorithm
    Xu Y.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [16] A multi-block clustering algorithm for high dimensional binarized sparse data
    Kosztyan, Zsolt T.
    Telcs, Andras
    Abonyi, Janos
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [17] Nonlinear Partial Least Squares for Consistency Analysis of Meteorological Data
    Meng, Zhen
    Zhang, Shichang
    Yang, Yan
    Liu, Ming
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [18] Partial least squares analysis of neuroimaging data: applications and advances
    McIntosh, AR
    Lobaugh, NJ
    NEUROIMAGE, 2004, 23 : S250 - S263
  • [19] Multivariate analysis of fMRI data by oriented partial least squares
    Rayens, William S.
    Andersen, Anders H.
    MAGNETIC RESONANCE IMAGING, 2006, 24 (07) : 953 - 958
  • [20] SAS® partial least squares regression for analysis of spectroscopic data
    Reeves, JB
    Delwiche, SR
    JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2003, 11 (06) : 415 - 431