Spark-Based Large-Scale Matrix Inversion for Big Data Processing

被引:34
|
作者
Liu, Jun [1 ]
Liang, Yang [1 ]
Ansari, Nirwan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Ctr Data Sci, Beijing 100876, Peoples R China
[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
来源
IEEE ACCESS | 2016年 / 4卷
关键词
Matrix inversion; LU decomposition; linear algebra; parallel algorithm; distributed computing; Spark;
D O I
10.1109/ACCESS.2016.2546544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix inversion is a fundamental operation for solving linear equations for many computational applications, especially for various emerging big data applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands or millions), which are common in most Web-scale systems, such as social networks and recommendation systems. In this paper, we present an lower upper decomposition-based block-recursive algorithm for large-scale matrix inversion. We present its well-designed implementation with optimized data structure, reduction of space complexity, and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and is scalable for inverting even larger matrices. The proposed algorithm and implementation will become a solid foundation for building a high-performance linear algebra library on Spark for big data processing and applications.
引用
收藏
页码:2166 / 2176
页数:11
相关论文
共 50 条
  • [31] Large-scale text processing pipeline with Apache Spark
    Svyatkovskiy, A.
    Imai, K.
    Kroeger, M.
    Shiraito, Y.
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3928 - 3935
  • [32] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
    Fernandez-Gomez, Antonio M.
    Gutierrez-Aviles, David
    Troncoso, Alicia
    Martinez-Alvarez, Francisco
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11078 - 11100
  • [33] QN inversion of large-scale MT data
    Avdeeva, A. D.
    Avdeev, D. B.
    PIERS 2006 CAMBRIDGE: PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM, PROCEEDINGS, 2006, : 210 - +
  • [34] Super large-scale magnetic data inversion
    Yang, Bo
    Xu, Yixian
    NEAR-SURFACE GEOPHYSICS AND GEOHAZARDS - PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL AND ENGINEERING GEOPHYSICS, VOLS 1 AND 2, 2010, : 777 - 782
  • [35] Spark-based parallel processing whale optimization algorithm
    Alshayeji, Mohammad
    Behbehani, Bader
    Ahmad, Imtiaz
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (04):
  • [36] A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification
    Al-Sawwa, Jamil
    Almseidin, Mohammad
    INFORMATION, 2022, 13 (11)
  • [37] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
    Antonio M. Fernández-Gómez
    David Gutiérrez-Avilés
    Alicia Troncoso
    Francisco Martínez-Álvarez
    The Journal of Supercomputing, 2023, 79 : 11078 - 11100
  • [38] Spark-based adaptive Mapreduce data processing method for remote sensing imagery
    Tan, Xicheng
    Di, Liping
    Zhong, Yanfei
    Yao, Yayu
    Sun, Ziheng
    Ali, Yahya
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2021, 42 (01) : 171 - 187
  • [39] On the Large-scale Graph Data Processing for User Interface Testing in Big Data Science Projects
    Uygun, Yasin
    Oguz, Ramazan Faruk
    Olmezogullari, Erdi
    Aktas, Mehmet S.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2049 - 2056
  • [40] Spark-based parallel dynamic programming and particle swarm optimization via cloud computing for a large-scale reservoir system
    Ma, Yufei
    Zhong, Ping-an
    Xu, Bin
    Zhu, Feilin
    Lu, Qingwen
    Wang, Han
    JOURNAL OF HYDROLOGY, 2021, 598