In-memory Distributed Matrix Computation Processing and Optimization

被引:20
|
作者
Yu, Yongyang [1 ]
Tang, Mingjie [1 ]
Aref, Walid G. [1 ]
Malluhi, Qutaibah M. [2 ]
Abbas, Mostafa M. [3 ]
Ouzzani, Mourad [3 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Qatar Univ, Doha, Qatar
[3] HBKU, Qatar Comp Res Inst, Doha, Qatar
基金
美国国家科学基金会;
关键词
Matrix computation; query optimization; distributed computing;
D O I
10.1109/ICDE.2017.150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment. The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform. Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of-the-art distributed matrix computation systems on a wide range of applications.
引用
收藏
页码:1047 / 1058
页数:12
相关论文
共 50 条
  • [41] In-Memory Data Processing for Sales Planning
    Hrubaru, Ionut
    INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE THROUGH VISION 2020, VOLS I -XI, 2018, : 2582 - 2588
  • [42] Scalable in-memory processing of omics workflows
    Elisseev, Vadim
    Gardiner, Laura-Jayne
    Krishna, Ritesh
    Computational and Structural Biotechnology Journal, 2022, 20 : 1914 - 1924
  • [43] Efficient In-Memory Processing Using Spintronics
    Chowdhury, Zamshed
    Harms, Jonathan D.
    Khatamifard, S. Karen
    Zabihi, Masoud
    Lv, Yang
    Lyle, Andrew P.
    Sapatnekar, Sachin S.
    Karpuzcu, Ulya R.
    Wang, Jian-Ping
    IEEE COMPUTER ARCHITECTURE LETTERS, 2018, 17 (01) : 42 - 46
  • [44] Scalable in-memory processing of omics workflows
    Elisseev, Vadim
    Gardiner, Laura-Jayne
    Krishna, Ritesh
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 1914 - 1924
  • [45] Exploring Processing In-Memory for Different Technologies
    Gupta, Saransh
    Imani, Mohsen
    Rosing, Tajana
    GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 201 - 206
  • [46] Multi-Layer In-Memory Processing
    Fujiki, Daichi
    Khadem, Alireza
    Mahlke, Scott
    Das, Reetuparna
    2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2022, : 920 - 936
  • [47] Scalable In-Memory Transaction Processing with HTM
    Wu, Yingjun
    Tan, Kian-Lee
    PROCEEDINGS OF USENIX ATC '16: 2016 USENIX ANNUAL TECHNICAL CONFERENCE, 2016, : 365 - 377
  • [48] IMAGING: In-Memory AlGorithms for Image processiNG
    Haj-Ali, Ameer
    Ben-Hur, Rotem
    Wald, Nimrod
    Ronen, Ronny
    Kvatinsky, Shahar
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (12) : 4258 - 4271
  • [49] Special Issue on Near-Memory and In-Memory Processing
    Pande, Partha Pratim
    IEEE DESIGN & TEST, 2022, 39 (02) : 4 - 4
  • [50] Performance Enhancement of Distributed K-Means Clustering for Big Data Analytics Through In-memory Computation
    Ketu, Shwet
    Agarwal, Sonali
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 318 - 324