Big data in multi-block data analysis: An approach to parallelizing Partial Least Squares Mode B algorithm

被引:2
|
作者
Martinez-Ruiz, Alba [1 ]
Montanola-Sales, Cristina [2 ,3 ]
机构
[1] Univ Catolica Santisima Concepcion, Alonso Ribera 2850, Concepcion, Chile
[2] URL, IQS, Via Augusta,390, Barcelona 08017, Spain
[3] CNS, BSC, Jordi Girona 29, Barcelona 08034, Spain
关键词
Computer science; Computational mathematics; VARIABLES;
D O I
10.1016/j.heliyon.2019.e01451
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Partial Least Squares (PLS) Mode B is a multi-block method and a tightly coupled algorithm for estimating structural equation models (SEMs). Describing key aspects of parallel computing, we approach the parallelization of the PLS Mode B algorithm to operate on large distributed data. We show the scalability and performance of the algorithm at a very fine-grained level thanks to the versatility of pbdR, a R-project library for parallel computing. We vary several factors under different data distribution schemes in a supercomputing environment. Shorter elapsed times are obtained for the square-blocking factor 16 x 16 using a grid of processors as square as possible and non-square blocking factors 1000 x 4 and 10000 x 4 using an one-column grid of processors. Depending on the configuration, distributing data in a larger number of cores allows reaching speedups of up to 121 over the CPU implementation. Moreover, we show that SEMs can be estimated with big data sets using current state-of-the-art algorithms for multi-block data analysis.
引用
收藏
页数:29
相关论文
共 50 条
  • [41] A data envelopment analysis and local partial least squares approach for identifying the optimal innovation policy direction
    Tziogkidis, Panagiotis
    Philippas, Dionisis
    Leontitsis, Alexandros
    Sickles, Robin C.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 285 (03) : 1011 - 1024
  • [42] A Non-iterative Partial Least Squares Algorithm for Supervised Learning with Collinear Data
    Qin, S. Joe
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3683 - 3688
  • [44] AN EXAMPLE OF 2-BLOCK PREDICTIVE PARTIAL LEAST-SQUARES REGRESSION WITH SIMULATED DATA
    GELADI, P
    KOWALSKI, BR
    ANALYTICA CHIMICA ACTA, 1986, 185 : 19 - 32
  • [45] Compressed Partial Least Squares Regression: A Supervised Method for Multi-label Data
    Ma, Zongjie
    Liu, Huawen
    Zheng, Zhonglong
    Zhao, Jianmin
    Xu, Xiaodan
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 385 - 389
  • [46] [MEG]PLS: A pipeline for MEG data analysis and partial least squares statistics
    Cheung, Michael J.
    Kovacevic, Natasa
    Fatima, Zainab
    Misic, Bratislav
    McIntosh, Anthony R.
    NEUROIMAGE, 2016, 124 : 181 - 193
  • [47] Application of Modified Partial Least Squares in Data Analysis of Traditional Chinese Medicine
    Xiong, Wangping
    Du, Jianqiang
    Nie, Bin
    Huang, Liping
    Zhou, Xian
    PROCEEDINGS OF 2017 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2017), 2017, : 231 - 235
  • [48] PARTIAL LEAST-SQUARES QUANTITATIVE-ANALYSIS OF INFRARED SPECTROSCOPIC DATA .1. ALGORITHM IMPLEMENTATION
    FULLER, MP
    RITTER, GL
    DRAPER, CS
    APPLIED SPECTROSCOPY, 1988, 42 (02) : 217 - 227
  • [49] Multivariate co-inertia analysis for qualitative data by partial least squares
    L. D’Ambra
    R. Lombardo
    P. Amenta
    Journal of the Italian Statistical Society, 2000, 9 (1-3) : 23 - 37
  • [50] A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification
    Burguillo, Francisco J.
    Corchete, Luis A.
    Martin, Javier
    Barrera, Inmaculada
    Bardsley, William G.
    CURRENT BIOINFORMATICS, 2014, 9 (03) : 348 - 359