Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters

被引：0

作者：

Yuan, Yuan ^{[1
]}

Salmi, Meisam Fathi ^{[2
]}

Huai, Yin ^{[3
]}

Wang, Kaibo ^{[4
]}

Lee, Rubao ^{[1
]}

Zhang, Xiaodong ^{[1
]}

机构：

[1] Ohio State Univ, Columbus, OH 43210 USA

[2] Paypal Inc, San Jose, CA USA

[3] Databricks Inc, San Francisco, CA USA

[4] Google Inc, Menlo Pk, CA USA

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2016年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU's massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing system into a GPU-supported system by addressing several real-world technical challenges including minimizing internal and external data transfers, preparing a suitable data format and a batching mode for efficient GPU execution, and determining the suitability of workloads for GPU with a task scheduling capability between CPU and GPU. We have comprehensively evaluated Spark-GPU with a set of representative analytical workloads to show its effectiveness. Our results show that Spark-GPU improves the performance of machine learning workloads by up to 16.13x and the performance of SQL queries by up to 4.83x.

引用

页码：273 / 283

页数：11

共 50 条

[31] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
Zhi-Guang Chen
Yu-Bo Liu
Yong-Feng Wang
Yu-Tong Lu
[J]. Journal of Computer Science and Technology, 2021, 36 : 44 - 55
[32] A GPU-Accelerated In-Memory Metadata Management Scheme for Large-Scale Parallel File Systems
Chen, Zhi-Guang
Liu, Yu-Bo
Wang, Yong-Feng
Lu, Yu-Tong
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (01) : 44 - 55
[33] Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Awan, Ahsan Javed
Brorsson, Mats
Vlassov, Vladimir
Ayguade, Eduard
[J]. 2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 237 - 246
[34] Timo: In-Memory Temporal Query Processing for Big Temporal Data
Zheng, Xiao
Liu, Hou-kai
Wei, Lin-na
Wu, Xuan-gou
Zhang, Zhen
[J]. 2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 121 - 126
[35] Performance enhancement for iterative data computing with in-memory concurrent processing
Wen, Yean-Fu
Chen, Yu-Fang
Chiu, Tse Kai
Chen, Yen-Chou
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (07):
[36] Processing data where it makes sense: Enabling in-memory computation
Mutlu, Onur
Ghose, Saugata
Gomez-Luna, Juan
Ausavarungnirun, Rachata
[J]. MICROPROCESSORS AND MICROSYSTEMS, 2019, 67 : 28 - 41
[37] Practical Near-Data Processing for In-memory Analytics Frameworks
Gao, Mingyu
Ayers, Grant
Kozyrakis, Christos
[J]. 2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124
[38] Ultra-Efficient Processing In-Memory for Data Intensive Applications
Imani, Mohsen
Gupta, Saransh
Rosing, Tajana
[J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[39] GenPIM: Generalized Processing In-Memory to Accelerate Data Intensive Applications
Imani, Mohsen
Gupta, Saransh
Rosing, Tajana
[J]. PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1155 - 1158
[40] Timo: In-memory temporal query processing for big temporal data
Zheng, Xiao
Liu, Houkai
Wang, Xiujun
Wu, Xuangou
Yu, Feng
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (13):

← 1 2 3 4 5 →