In-memory Distributed Matrix Computation Processing and Optimization

被引:19
|
作者
Yu, Yongyang [1 ]
Tang, Mingjie [1 ]
Aref, Walid G. [1 ]
Malluhi, Qutaibah M. [2 ]
Abbas, Mostafa M. [3 ]
Ouzzani, Mourad [3 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Qatar Univ, Doha, Qatar
[3] HBKU, Qatar Comp Res Inst, Doha, Qatar
基金
美国国家科学基金会;
关键词
Matrix computation; query optimization; distributed computing;
D O I
10.1109/ICDE.2017.150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment. The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform. Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of-the-art distributed matrix computation systems on a wide range of applications.
引用
收藏
页码:1047 / 1058
页数:12
相关论文
共 50 条
  • [1] LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
    Tang, Mingjie
    Yu, Yongyang
    Mahmood, Ahmed R.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    [J]. FRONTIERS IN BIG DATA, 2020, 3
  • [2] Inner Product Computation In-Memory Using Distributed Arithmetic
    Lakshmi, Vijaya
    Pudi, Vikramkumar
    Reuben, John
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (11) : 4546 - 4557
  • [3] In-Memory Indexed Caching for Distributed Data Processing
    Uta, Alexandru
    Ghit, Bogdan
    Dave, Ankur
    Rellermeyer, Jan
    Boncz, Peter
    [J]. 2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 104 - 114
  • [4] GPU in-memory processing using Spark for iterative computation
    Hong, Sumin
    Choi, Woohyuk
    Jeong, Won-Ki
    [J]. 2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 31 - 41
  • [5] Genomic Variant Analysis Using Distributed In-Memory Computation Framework
    Dongel, Tugce
    Timar, Yasemin
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 961 - 966
  • [6] Prometheus: Online Estimation of Optimal Memory Demands for Workers in In-memory Distributed Computation
    Xu, Guoyao
    Xu, Cheng-Zhong
    [J]. PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 655 - 655
  • [7] Employing In-Memory Data Grids for Distributed Graph Processing
    Tasci, Serafettin
    Demirbas, Murat
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1856 - 1864
  • [8] Processing data where it makes sense: Enabling in-memory computation
    Mutlu, Onur
    Ghose, Saugata
    Gomez-Luna, Juan
    Ausavarungnirun, Rachata
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2019, 67 : 28 - 41
  • [9] Memory Processing Unit for In-Memory Processing
    Ben Hur, Rotem
    Kvatinsky, Shahar
    [J]. PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES (NANOARCH), 2016, : 171 - 172
  • [10] Distributed In-Memory Processing of All k Nearest Neighbor Queries
    Chatzimilioudis, Georgios
    Costa, Constantinos
    Zeinalipour-Yazti, Demetrios
    Lee, Wang-Chien
    Pitoura, Evaggelia
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (04) : 925 - 938