Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

被引:0
|
作者
Yuan, Yuan [1 ]
Salmi, Meisam Fathi [2 ]
Huai, Yin [3 ]
Wang, Kaibo [4 ]
Lee, Rubao [1 ]
Zhang, Xiaodong [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Paypal Inc, San Jose, CA USA
[3] Databricks Inc, San Francisco, CA USA
[4] Google Inc, Menlo Pk, CA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU's massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing system into a GPU-supported system by addressing several real-world technical challenges including minimizing internal and external data transfers, preparing a suitable data format and a batching mode for efficient GPU execution, and determining the suitability of workloads for GPU with a task scheduling capability between CPU and GPU. We have comprehensively evaluated Spark-GPU with a set of representative analytical workloads to show its effectiveness. Our results show that Spark-GPU improves the performance of machine learning workloads by up to 16.13x and the performance of SQL queries by up to 4.83x.
引用
收藏
页码:273 / 283
页数:11
相关论文
共 50 条
  • [1] GPU in-memory processing using Spark for iterative computation
    Hong, Sumin
    Choi, Woohyuk
    Jeong, Won-Ki
    [J]. 2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 31 - 41
  • [2] DISE: A Distributed in-Memory SPARQL Processing Engine over Tensor Data
    Jabeen, Hajira
    Haziiev, Eskender
    Sejdiu, Gezim
    Lehmann, Jens
    [J]. 2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 400 - 407
  • [3] GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data
    Chen, Cen
    Li, Kenli
    Ouyang, Aijia
    Tang, Zhuo
    Li, Keqin
    [J]. PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 542 - 551
  • [4] GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data
    Chen, Cen
    Li, Kenli
    Ouyang, Aijia
    Zeng, Zeng
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) : 1275 - 1288
  • [5] Ignite-GPU: a GPU-enabled in-memory computing architecture on clusters
    Sojoodi, Amir Hossein
    Salimi Beni, Majid
    Khunjush, Farshad
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (03): : 3165 - 3192
  • [6] Ignite-GPU: a GPU-enabled in-memory computing architecture on clusters
    Amir Hossein Sojoodi
    Majid Salimi Beni
    Farshad Khunjush
    [J]. The Journal of Supercomputing, 2021, 77 : 3165 - 3192
  • [7] Mille Cheval: a GPU-based in-memory high-performance computing framework for accelerated processing of big-data streams
    Kumar, Vivek
    Sharma, Dilip Kumar
    Mishra, Vinay Kumar
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (07): : 6936 - 6960
  • [8] Mille Cheval: a GPU-based in-memory high-performance computing framework for accelerated processing of big-data streams
    Vivek Kumar
    Dilip Kumar Sharma
    Vinay Kumar Mishra
    [J]. The Journal of Supercomputing, 2021, 77 : 6936 - 6960
  • [9] In-Memory Data Processing for Sales Planning
    Hrubaru, Ionut
    [J]. INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE THROUGH VISION 2020, VOLS I -XI, 2018, : 2582 - 2588
  • [10] Adaptive in-memory representation of decision trees for GPU-accelerated evolutionary induction
    Jurczuk, Krzysztof
    Czajkowski, Marcin
    Kretowski, Marek
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 153 : 419 - 430