Spark-Based Large-Scale Matrix Inversion for Big Data Processing

被引:34
|
作者
Liu, Jun [1 ]
Liang, Yang [1 ]
Ansari, Nirwan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Ctr Data Sci, Beijing 100876, Peoples R China
[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
来源
IEEE ACCESS | 2016年 / 4卷
关键词
Matrix inversion; LU decomposition; linear algebra; parallel algorithm; distributed computing; Spark;
D O I
10.1109/ACCESS.2016.2546544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix inversion is a fundamental operation for solving linear equations for many computational applications, especially for various emerging big data applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands or millions), which are common in most Web-scale systems, such as social networks and recommendation systems. In this paper, we present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. We present its well-designed implementation with optimized data structure, reduction of space complexity, and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and is scalable for inverting even larger matrices. The proposed algorithm and implementation will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.
引用
收藏
页码:2166 / 2176
页数:11
相关论文
共 50 条
  • [41] On the Clustering of Large-scale Data: A Matrix-based Approach
    Wang, Lijun
    Dong, Ming
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 139 - 144
  • [42] Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark
    Thuong-Cang Phan
    Anh-Cang Phan
    Thi-To-Quyen Tran
    Ngoan-Thanh Trieu
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING (ICCSAMA 2019), 2020, 1121 : 391 - 402
  • [43] Evaluation of Large-scale Complex Systems Effectiveness Based on Big Data
    Sun Zhi-peng
    Chen Gui-ming
    Zhang Hui
    ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 72 - 76
  • [44] A spark-based big data analysis framework for real-time sentiment prediction on streaming data
    Kilinc, Deniz
    SOFTWARE-PRACTICE & EXPERIENCE, 2019, 49 (09): : 1352 - 1364
  • [45] Spark-based Rare Association Rule Mining for Big Datasets
    Liu, Ruilin
    Yang, Kai
    Sun, Yanjia
    Quan, Tao
    Yang, Jin
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2734 - 2739
  • [46] CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore
    Sanchez-Ayte, Adam
    Jouanot, Fabrice
    Rousset, Marie-Christine
    SEMANTIC WEB, ESWC 2022, 2022, 13261 : 444 - 460
  • [47] A novel spark-based multi-step forecasting algorithm for big data time series
    Galicia, A.
    Torres, J. F.
    Martinez-Alvarez, F.
    Troncoso, A.
    INFORMATION SCIENCES, 2018, 467 : 800 - 818
  • [48] Inversion of large-scale gravity data with application of VNet
    Huang, R.
    Zhang, Y.
    Vatankhah, S.
    Liu, S.
    Qi, R.
    GEOPHYSICAL JOURNAL INTERNATIONAL, 2022, 231 (01) : 306 - 318
  • [49] Computational structures of functional units for large-scale matrix inversion
    Zhukov, I.A.
    Engineering Simulation, 1995, 12 (04): : 564 - 568
  • [50] KP-S: A Spark-based Design of the K-Prototypes Clustering for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 557 - 563