Spark-Based Large-Scale Matrix Inversion for Big Data Processing

被引：34

作者：

Liu, Jun ^{[1
]}

Liang, Yang ^{[1
]}

Ansari, Nirwan ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Ctr Data Sci, Beijing 100876, Peoples R China

[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA

来源：

IEEE ACCESS | 2016年 / 4卷

关键词：

Matrix inversion; LU decomposition; linear algebra; parallel algorithm; distributed computing; Spark;

D O I：

10.1109/ACCESS.2016.2546544

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix inversion is a fundamental operation for solving linear equations for many computational applications, especially for various emerging big data applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands or millions), which are common in most Web-scale systems, such as social networks and recommendation systems. In this paper, we present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. We present its well-designed implementation with optimized data structure, reduction of space complexity, and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and is scalable for inverting even larger matrices. The proposed algorithm and implementation will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.

引用

页码：2166 / 2176

页数：11

共 50 条

[21] A Spark-based parallel framework for geospatial raster data processing
Gao, Fan
Yue, Peng
2018 7TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS (AGRO-GEOINFORMATICS), 2018, : 53 - 56
[22] BDPS: An Efficient Spark-Based Big Data Processing Scheme for Cloud Fog-IoT Orchestration
Hossen, Rakib
Whaiduzzaman, Md
Uddin, Mohammed Nasir
Islam, Md. Jahidul
Faruqui, Nuruzzaman
Barros, Alistair
Sookhak, Mehdi
Mahi, Md. Julkar Nayeen
INFORMATION, 2021, 12 (12)
[23] Spark-based data analytics of sequence motifs in large omics data
Sarumi, Oluwafemi A.
Leung, Carson K.
Adetunmbi, Adebayo O.
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 596 - 605
[24] The Research of Large Scale Data Processing Platform Based on the Spark
Na, Chu
Xin, Cao
2016 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2017, : 293 - 296
[25] Large-scale inversion of ZTEM data
Holtham, Elliot
Oldenburg, Douglas W.
GEOPHYSICS, 2012, 77 (04) : WB37 - WB45
[26] FAIRly big: A framework for computationally reproducible processing of large-scale data
Adina S. Wagner
Laura K. Waite
Małgorzata Wierzba
Felix Hoffstaedter
Alexander Q. Waite
Benjamin Poldrack
Simon B. Eickhoff
Michael Hanke
Scientific Data, 9
[27] FAIRly big: A framework for computationally reproducible processing of large-scale data
Wagner, Adina S.
Waite, Laura K.
Wierzba, Malgorzata
Hoffstaedter, Felix
Waite, Alexander Q.
Poldrack, Benjamin
Eickhoff, Simon B.
Hanke, Michael
SCIENTIFIC DATA, 2022, 9 (01)
[28] A Spark-based Analytic Pipeline for Seizure Detection in EEG Big Data Streams
Sendi, Mohammad S. E.
Heydarzadeh, Mehrdad
Mahmoudi, Babak
2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 4003 - 4006
[29] Large-Scale Data Pollution with Apache Spark
Hildebrandt, Kai
Panse, Fabian
Wilcke, Niklas
Ritter, Norbert
IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 396 - 411
[30] An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data
Al-Bana, Mohamed Reda
Farhan, Marwa Salah
Othman, Nermin Abdelhakim
DATA, 2022, 7 (01)

← 1 2 3 4 5 →