Spark-Based Large-Scale Matrix Inversion for Big Data Processing

被引:34
|
作者
Liu, Jun [1 ]
Liang, Yang [1 ]
Ansari, Nirwan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Ctr Data Sci, Beijing 100876, Peoples R China
[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
来源
IEEE ACCESS | 2016年 / 4卷
关键词
Matrix inversion; LU decomposition; linear algebra; parallel algorithm; distributed computing; Spark;
D O I
10.1109/ACCESS.2016.2546544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix inversion is a fundamental operation for solving linear equations for many computational applications, especially for various emerging big data applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands or millions), which are common in most Web-scale systems, such as social networks and recommendation systems. In this paper, we present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. We present its well-designed implementation with optimized data structure, reduction of space complexity, and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and is scalable for inverting even larger matrices. The proposed algorithm and implementation will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.
引用
收藏
页码:2166 / 2176
页数:11
相关论文
共 50 条
  • [21] A Spark-based parallel framework for geospatial raster data processing
    Gao, Fan
    Yue, Peng
    2018 7TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS (AGRO-GEOINFORMATICS), 2018, : 53 - 56
  • [22] BDPS: An Efficient Spark-Based Big Data Processing Scheme for Cloud Fog-IoT Orchestration
    Hossen, Rakib
    Whaiduzzaman, Md
    Uddin, Mohammed Nasir
    Islam, Md. Jahidul
    Faruqui, Nuruzzaman
    Barros, Alistair
    Sookhak, Mehdi
    Mahi, Md. Julkar Nayeen
    INFORMATION, 2021, 12 (12)
  • [23] Spark-based data analytics of sequence motifs in large omics data
    Sarumi, Oluwafemi A.
    Leung, Carson K.
    Adetunmbi, Adebayo O.
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 596 - 605
  • [24] The Research of Large Scale Data Processing Platform Based on the Spark
    Na, Chu
    Xin, Cao
    2016 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2017, : 293 - 296
  • [25] Large-scale inversion of ZTEM data
    Holtham, Elliot
    Oldenburg, Douglas W.
    GEOPHYSICS, 2012, 77 (04) : WB37 - WB45
  • [26] FAIRly big: A framework for computationally reproducible processing of large-scale data
    Adina S. Wagner
    Laura K. Waite
    Małgorzata Wierzba
    Felix Hoffstaedter
    Alexander Q. Waite
    Benjamin Poldrack
    Simon B. Eickhoff
    Michael Hanke
    Scientific Data, 9
  • [27] FAIRly big: A framework for computationally reproducible processing of large-scale data
    Wagner, Adina S.
    Waite, Laura K.
    Wierzba, Malgorzata
    Hoffstaedter, Felix
    Waite, Alexander Q.
    Poldrack, Benjamin
    Eickhoff, Simon B.
    Hanke, Michael
    SCIENTIFIC DATA, 2022, 9 (01)
  • [28] A Spark-based Analytic Pipeline for Seizure Detection in EEG Big Data Streams
    Sendi, Mohammad S. E.
    Heydarzadeh, Mehrdad
    Mahmoudi, Babak
    2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 4003 - 4006
  • [29] Large-Scale Data Pollution with Apache Spark
    Hildebrandt, Kai
    Panse, Fabian
    Wilcke, Niklas
    Ritter, Norbert
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 396 - 411
  • [30] An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data
    Al-Bana, Mohamed Reda
    Farhan, Marwa Salah
    Othman, Nermin Abdelhakim
    DATA, 2022, 7 (01)