Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

被引:0
|
作者
Yuan, Yuan [1 ]
Salmi, Meisam Fathi [2 ]
Huai, Yin [3 ]
Wang, Kaibo [4 ]
Lee, Rubao [1 ]
Zhang, Xiaodong [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Paypal Inc, San Jose, CA USA
[3] Databricks Inc, San Francisco, CA USA
[4] Google Inc, Menlo Pk, CA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU's massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing system into a GPU-supported system by addressing several real-world technical challenges including minimizing internal and external data transfers, preparing a suitable data format and a batching mode for efficient GPU execution, and determining the suitability of workloads for GPU with a task scheduling capability between CPU and GPU. We have comprehensively evaluated Spark-GPU with a set of representative analytical workloads to show its effectiveness. Our results show that Spark-GPU improves the performance of machine learning workloads by up to 16.13x and the performance of SQL queries by up to 4.83x.
引用
收藏
页码:273 / 283
页数:11
相关论文
共 50 条
  • [31] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
    Zhi-Guang Chen
    Yu-Bo Liu
    Yong-Feng Wang
    Yu-Tong Lu
    [J]. Journal of Computer Science and Technology, 2021, 36 : 44 - 55
  • [32] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
    Chen, Zhi-Guang
    Liu, Yu-Bo
    Wang, Yong-Feng
    Lu, Yu-Tong
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (01) : 44 - 55
  • [33] Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
    Awan, Ahsan Javed
    Brorsson, Mats
    Vlassov, Vladimir
    Ayguade, Eduard
    [J]. 2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 237 - 246
  • [34] Timo: In-Memory Temporal Query Processing for Big Temporal Data
    Zheng, Xiao
    Liu, Hou-kai
    Wei, Lin-na
    Wu, Xuan-gou
    Zhang, Zhen
    [J]. 2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 121 - 126
  • [35] Performance enhancement for iterative data computing with in-memory concurrent processing
    Wen, Yean-Fu
    Chen, Yu-Fang
    Chiu, Tse Kai
    Chen, Yen-Chou
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (07):
  • [36] Processing data where it makes sense: Enabling in-memory computation
    Mutlu, Onur
    Ghose, Saugata
    Gomez-Luna, Juan
    Ausavarungnirun, Rachata
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2019, 67 : 28 - 41
  • [37] Practical Near-Data Processing for In-memory Analytics Frameworks
    Gao, Mingyu
    Ayers, Grant
    Kozyrakis, Christos
    [J]. 2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124
  • [38] Ultra-Efficient Processing In-Memory for Data Intensive Applications
    Imani, Mohsen
    Gupta, Saransh
    Rosing, Tajana
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [39] GenPIM: Generalized Processing In-Memory to Accelerate Data Intensive Applications
    Imani, Mohsen
    Gupta, Saransh
    Rosing, Tajana
    [J]. PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1155 - 1158
  • [40] Timo: In-memory temporal query processing for big temporal data
    Zheng, Xiao
    Liu, Houkai
    Wang, Xiujun
    Wu, Xuangou
    Yu, Feng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (13):