Spark-Based Large-Scale Matrix Inversion for Big Data Processing

被引：34

作者：

Liu, Jun ^{[1
]}

Liang, Yang ^{[1
]}

Ansari, Nirwan ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Ctr Data Sci, Beijing 100876, Peoples R China

[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA

来源：

IEEE ACCESS | 2016年 / 4卷

关键词：

Matrix inversion; LU decomposition; linear algebra; parallel algorithm; distributed computing; Spark;

D O I：

10.1109/ACCESS.2016.2546544

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix inversion is a fundamental operation for solving linear equations for many computational applications, especially for various emerging big data applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands or millions), which are common in most Web-scale systems, such as social networks and recommendation systems. In this paper, we present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. We present its well-designed implementation with optimized data structure, reduction of space complexity, and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and is scalable for inverting even larger matrices. The proposed algorithm and implementation will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.

引用

页码：2166 / 2176

页数：11

共 50 条

[41] On the Clustering of Large-scale Data: A Matrix-based Approach
Wang, Lijun
Dong, Ming
2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 139 - 144
[42] Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark
Thuong-Cang Phan
Anh-Cang Phan
Thi-To-Quyen Tran
Ngoan-Thanh Trieu
ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING (ICCSAMA 2019), 2020, 1121 : 391 - 402
[43] Evaluation of Large-scale Complex Systems Effectiveness Based on Big Data
Sun Zhi-peng
Chen Gui-ming
Zhang Hui
ICBDC 2019: PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON BIG DATA AND COMPUTING, 2019, : 72 - 76
[44] A spark-based big data analysis framework for real-time sentiment prediction on streaming data
Kilinc, Deniz
SOFTWARE-PRACTICE & EXPERIENCE, 2019, 49 (09): : 1352 - 1364
[45] Spark-based Rare Association Rule Mining for Big Datasets
Liu, Ruilin
Yang, Kai
Sun, Yanjia
Quan, Tao
Yang, Jin
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2734 - 2739
[46] CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore
Sanchez-Ayte, Adam
Jouanot, Fabrice
Rousset, Marie-Christine
SEMANTIC WEB, ESWC 2022, 2022, 13261 : 444 - 460
[47] A novel spark-based multi-step forecasting algorithm for big data time series
Galicia, A.
Torres, J. F.
Martinez-Alvarez, F.
Troncoso, A.
INFORMATION SCIENCES, 2018, 467 : 800 - 818
[48] Inversion of large-scale gravity data with application of VNet
Huang, R.
Zhang, Y.
Vatankhah, S.
Liu, S.
Qi, R.
GEOPHYSICAL JOURNAL INTERNATIONAL, 2022, 231 (01) : 306 - 318
[49] Computational structures of functional units for large-scale matrix inversion
Zhukov, I.A.
Engineering Simulation, 1995, 12 (04): : 564 - 568
[50] KP-S: A Spark-based Design of the K-Prototypes Clustering for Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 557 - 563

← 1 2 3 4 5 →