In-memory Distributed Matrix Computation Processing and Optimization

被引:20
|
作者
Yu, Yongyang [1 ]
Tang, Mingjie [1 ]
Aref, Walid G. [1 ]
Malluhi, Qutaibah M. [2 ]
Abbas, Mostafa M. [3 ]
Ouzzani, Mourad [3 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Qatar Univ, Doha, Qatar
[3] HBKU, Qatar Comp Res Inst, Doha, Qatar
基金
美国国家科学基金会;
关键词
Matrix computation; query optimization; distributed computing;
D O I
10.1109/ICDE.2017.150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to be optimized and sequenced properly for efficient execution. This paper presents new efficient and scalable matrix processing and optimization techniques for in-memory distributed clusters. The proposed techniques estimate the sparsity of intermediate matrix-computation results and optimize communication costs. An evaluation plan generator for complex matrix computations is introduced as well as a distributed plan optimizer that exploits dynamic cost-based analysis and rule-based heuristics to optimize the cost of matrix computations in an in-memory distributed environment. The result of a matrix operation will often serve as an input to another matrix operation, thus defining the matrix data dependencies within a matrix program. The matrix query plan generator produces query execution plans that minimize memory usage and communication overhead by partitioning the matrix based on the data dependencies in the execution plan. We implemented the proposed matrix processing and optimization techniques in Spark, a distributed in-memory computing platform. Experiments on both real and synthetic data demonstrate that our proposed techniques achieve up to an order-of-magnitude performance improvement over state-of-the-art distributed matrix computation systems on a wide range of applications.
引用
收藏
页码:1047 / 1058
页数:12
相关论文
共 50 条
  • [31] imGraph: A distributed in-memory graph database
    Jouili, Salim
    Reynaga, Aldemar
    2013 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING (SOCIALCOM), 2013, : 732 - 737
  • [32] Distributed Architecture of Oracle Database In-memory
    Mukherjee, Niloy
    Chavan, Shasank
    Colgan, Maria
    Das, Dinesh
    Gleeson, Mike
    Hase, Sanket
    Holloway, Allison
    Jin, Hui
    Kamp, Jesse
    Kulkarni, Kartik
    Lahiri, Tirthankar
    Loaiza, Juan
    Macnaughton, Neil
    Marwah, Vineet
    Mullick, Atrayee
    Witkowski, Andy
    Yan, Jiaqi
    Zait, Mohamed
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1630 - 1641
  • [33] Noisy In-Memory Recursive Computation with Memristor Crossbars
    Dupraz, Elsa
    Varshney, Lav R.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 804 - 809
  • [34] Stochastic Computing for Reliable Memristive In-Memory Computation
    Alam, Mohsen Riahi
    Najafi, M. Hassan
    TaheriNejad, Nima
    Imani, Mohsen
    Peng, Lu
    PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 397 - 401
  • [35] In-Memory Hamming Similarity Computation in Resistive Arrays
    Cassuto, Yuval
    Crammer, Koby
    2015 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2015, : 819 - 823
  • [36] MapReuse : Reusing Computation in an In-Memory MapReduce System
    Tiwari, Devesh
    Solihin, Yan
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [37] Addressing Resiliency of In-Memory Floating Point Computation
    Ensan, Sina Sayyah
    Ghosh, Swaroop
    Motaman, Seyedhamidreza
    Weast, Derek
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (09) : 1172 - 1183
  • [38] In-Memory Eigenvector Computation in Time O(1)
    Sun, Zhong
    Pedretti, Giacomo
    Ambrosi, Elia
    Bricalli, Alessandro
    Ielmini, Daniele
    ADVANCED INTELLIGENT SYSTEMS, 2020, 2 (08)
  • [39] Not in Name Alone: A Memristive Memory Processing Unit for Real In-Memory Processing
    Haj-Ali, Ameer
    Ben-Hur, Rotem
    Wald, Nimrod
    Ronen, Ronny
    Kvatinsky, Shahar
    IEEE MICRO, 2018, 38 (05) : 13 - 21
  • [40] A framework for computation-memory algorithmic optimization for signal processing
    Cheung, G
    McCanne, S
    IEEE TRANSACTIONS ON MULTIMEDIA, 2003, 5 (02) : 174 - 185