In-memory Distributed Matrix Computation Processing and Optimization

被引:20
|
作者
Yu, Yongyang [1 ]
Tang, Mingjie [1 ]
Aref, Walid G. [1 ]
Malluhi, Qutaibah M. [2 ]
Abbas, Mostafa M. [3 ]
Ouzzani, Mourad [3 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Qatar Univ, Doha, Qatar
[3] HBKU, Qatar Comp Res Inst, Doha, Qatar
基金
美国国家科学基金会;
关键词
Matrix computation; query optimization; distributed computing;
D O I
10.1109/ICDE.2017.150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment. The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform. Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of-the-art distributed matrix computation systems on a wide range of applications.
引用
收藏
页码:1047 / 1058
页数:12
相关论文
共 50 条
  • [21] Dima: A Distributed In-Memory Similarity-Based Query Processing System
    Sun, Ji
    Shang, Zeyuan
    Li, Guoliang
    Deng, Dong
    Bao, Zhifeng
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1925 - 1928
  • [22] Distributed Computation of Linear Matrix Equations: An Optimization Perspective
    Zeng, Xianlin
    Liang, Shu
    Hong, Yiguang
    Chen, Jie
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (05) : 1858 - 1873
  • [23] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
    Huang, Yin
    Yesha, Yelena
    Halem, Milton
    Yesha, Yaacov
    Zhou, Shujia
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
  • [24] In-Memory Computing Architectures for Sparse Distributed Memory
    Kang, Mingu
    Shanbhag, Naresh R.
    IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2016, 10 (04) : 855 - 863
  • [25] DISTIL: A Distributed In-Memory Data Processing System for Location-Based Services
    Patrou, Maria
    Alam, Md Mahbub
    Memarzia, Puya
    Ray, Suprio
    Bhavsar, Virendra C.
    Kent, Kenneth B.
    Dueck, Gerhard W.
    26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 496 - 499
  • [26] Toward Efficient Processing of Spatio-temporal Workloads in a Distributed In-memory System
    Memarzia, Puya
    Patrou, Maria
    Alam, Md Mahbub
    Ray, Suprio
    Bhavsar, Virendra C.
    Kent, Kenneth B.
    2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 118 - 127
  • [27] Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
    Mutlu, Onur
    2018 7TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2018, : 8 - 9
  • [28] Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation
    Mutlu, Onur
    GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 5 - 6
  • [29] Memristive Memory Processing Unit (MPU) Controller for In-Memory Processing
    Ben Hur, Rotem
    Kvatinsky, Shahar
    2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
  • [30] DITA: Distributed In-Memory Trajectory Analytics
    Shang, Zeyuan
    Li, Guoliang
    Bao, Zhifeng
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 725 - 740