Big data in multi-block data analysis: An approach to parallelizing Partial Least Squares Mode B algorithm

被引:2
|
作者
Martinez-Ruiz, Alba [1 ]
Montanola-Sales, Cristina [2 ,3 ]
机构
[1] Univ Catolica Santisima Concepcion, Alonso Ribera 2850, Concepcion, Chile
[2] URL, IQS, Via Augusta,390, Barcelona 08017, Spain
[3] CNS, BSC, Jordi Girona 29, Barcelona 08034, Spain
关键词
Computer science; Computational mathematics; VARIABLES;
D O I
10.1016/j.heliyon.2019.e01451
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Partial Least Squares (PLS) Mode B is a multi-block method and a tightly coupled algorithm for estimating structural equation models (SEMs). Describing key aspects of parallel computing, we approach the parallelization of the PLS Mode B algorithm to operate on large distributed data. We show the scalability and performance of the algorithm at a very fine-grained level thanks to the versatility of pbdR, a R-project library for parallel computing. We vary several factors under different data distribution schemes in a supercomputing environment. Shorter elapsed times are obtained for the square-blocking factor 16 x 16 using a grid of processors as square as possible and non-square blocking factors 1000 x 4 and 10000 x 4 using an one-column grid of processors. Depending on the configuration, distributing data in a larger number of cores allows reaching speedups of up to 121 over the CPU implementation. Moreover, we show that SEMs can be estimated with big data sets using current state-of-the-art algorithms for multi-block data analysis.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Incremental partial least squares analysis of big streaming data
    Zeng, Xue-Qiang
    Li, Guo-Zheng
    PATTERN RECOGNITION, 2014, 47 (11) : 3726 - 3735
  • [2] Identifying Significant Metabolic Pathways Using Multi-Block Partial Least-Squares Analysis
    Deng, Lingli
    Guo, Fanjing
    Cheng, Kian-Kai
    Zhu, Jiangjiang
    Gu, Haiwei
    Raftery, Daniel
    Dong, Jiyang
    JOURNAL OF PROTEOME RESEARCH, 2020, 19 (05) : 1965 - 1974
  • [3] Model validation and error estimation in multi-block partial least squares regression
    Hassani, Sahar
    Martens, Harald
    Qannari, El Mostafa
    Hanafi, Mohamed
    Kohler, Achim
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2012, 117 : 42 - 53
  • [4] Incorporating interactions in multi-block sequential and orthogonalised partial least squares regression
    Naes, Tormod
    Mage, Ingrid
    Segtnan, Vegard H.
    JOURNAL OF CHEMOMETRICS, 2011, 25 (11) : 601 - 609
  • [5] Big data and partial least-squares prediction
    Cook, R. Dennis
    Forzani, Liliana
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2018, 46 (01): : 62 - 78
  • [6] Multi-modality imaging data analysis with partial least squares
    Chau, W
    Habib, R
    McIntosh, AR
    BRAIN AND COGNITION, 2004, 54 (02) : 140 - 142
  • [7] Comparison of principal components regression, partial least squares regression, multi-block partial least squares regression, and serial partial least squares regression algorithms for the analysis of Fe in iron ore using LIBS
    Yaroshchyk, P.
    Death, D. L.
    Spencer, S. J.
    JOURNAL OF ANALYTICAL ATOMIC SPECTROMETRY, 2012, 27 (01) : 92 - 98
  • [9] Multi-Block ADMM for Big Data Optimization in Smart Grid
    Liu, Lanchao
    Han, Zhu
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2015, : 556 - 561
  • [10] Multi-kernel Partial Least Squares for Multi-Modal Data Analysis
    Wang, Ping
    Zhang, Hong
    PROCEEDINGS OF THE 2016 7TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND MEDICINE (EMCM 2016), 2017, 59 : 931 - 935