Spark-Based Large-Scale Matrix Inversion for Big Data Processing

被引：34

作者：

Liu, Jun ^{[1
]}

Liang, Yang ^{[1
]}

Ansari, Nirwan ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Ctr Data Sci, Beijing 100876, Peoples R China

[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA

来源：

IEEE ACCESS | 2016年 / 4卷

关键词：

Matrix inversion; LU decomposition; linear algebra; parallel algorithm; distributed computing; Spark;

D O I：

10.1109/ACCESS.2016.2546544

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix inversion is a fundamental operation for solving linear equations for many computational applications, especially for various emerging big data applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands or millions), which are common in most Web-scale systems, such as social networks and recommendation systems. In this paper, we present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. We present its well-designed implementation with optimized data structure, reduction of space complexity, and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and is scalable for inverting even larger matrices. The proposed algorithm and implementation will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.

引用

页码：2166 / 2176

页数：11

共 50 条

[31] Large-scale text processing pipeline with Apache Spark
Svyatkovskiy, A.
Imai, K.
Kroeger, M.
Shiraito, Y.
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3928 - 3935
[32] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
Fernandez-Gomez, Antonio M.
Gutierrez-Aviles, David
Troncoso, Alicia
Martinez-Alvarez, Francisco
JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11078 - 11100
[33] QN inversion of large-scale MT data
Avdeeva, A. D.
Avdeev, D. B.
PIERS 2006 CAMBRIDGE: PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM, PROCEEDINGS, 2006, : 210 - +
[34] Super large-scale magnetic data inversion
Yang, Bo
Xu, Yixian
NEAR-SURFACE GEOPHYSICS AND GEOHAZARDS - PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL AND ENGINEERING GEOPHYSICS, VOLS 1 AND 2, 2010, : 777 - 782
[35] Spark-based parallel processing whale optimization algorithm
Alshayeji, Mohammad
Behbehani, Bader
Ahmad, Imtiaz
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (04):
[36] A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification
Al-Sawwa, Jamil
Almseidin, Mohammad
INFORMATION, 2022, 13 (11)
[37] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
Antonio M. Fernández-Gómez
David Gutiérrez-Avilés
Alicia Troncoso
Francisco Martínez-Álvarez
The Journal of Supercomputing, 2023, 79 : 11078 - 11100
[38] Spark-based adaptive Mapreduce data processing method for remote sensing imagery
Tan, Xicheng
Di, Liping
Zhong, Yanfei
Yao, Yayu
Sun, Ziheng
Ali, Yahya
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2021, 42 (01) : 171 - 187
[39] On the Large-scale Graph Data Processing for User Interface Testing in Big Data Science Projects
Uygun, Yasin
Oguz, Ramazan Faruk
Olmezogullari, Erdi
Aktas, Mehmet S.
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2049 - 2056
[40] Spark-based parallel dynamic programming and particle swarm optimization via cloud computing for a large-scale reservoir system
Ma, Yufei
Zhong, Ping-an
Xu, Bin
Zhu, Feilin
Lu, Qingwen
Wang, Han
JOURNAL OF HYDROLOGY, 2021, 598

← 1 2 3 4 5 →