Optimizing Interactive Development of Data-Intensive Applications

被引：10

作者：

Interlandi, Matteo ^{[1
]}

Tetali, Sai Deep ^{[1
,2
]}

Gulzar, Muhammad Ali ^{[1
]}

Noor, Joseph ^{[1
]}

Condie, Tyson ^{[1
]}

Kim, Miryung ^{[1
]}

Millstein, Todd ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA

[2] Google Inc, Menlo Pk, CA USA

来源：

PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016) | 2016年

关键词：

Query Rewriting; Incremental Evaluation; Spark; Interactive Development; Big Data;

D O I：

10.1145/2987550.2987565

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. VEGA is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage VEGA to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.

引用

页码：510 / 522

页数：13

共 50 条

[1] Transfer scheduling schemes for data-intensive, interactive applications
Takizawa, Makoto
Shimizu, Takashi
Ishida, Osamu
[J]. GLOBECOM 2007: 2007 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-11, 2007, : 2488 - 2491
[2] Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
Ahmad, Maaz Bin Safeer
Cheung, Alvin
[J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1675 - 1678
[3] Optimizing Distributed Data-Intensive Workflows
Friese, Ryan D.
Tallent, Nathan R.
Schram, Malachi
Halappanavar, Mahantesh
Barker, Kevin J.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 279 - 289
[4] Model transformations in the development of data-intensive web applications
Di Ruscio, D
Pierantonio, A
[J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2005, 3520 : 475 - 490
[5] Applications in Data-Intensive Computing
Shah, Anuj R.
Adkins, Joshua N.
Baxter, Douglas J.
Cannon, William R.
Chavarria-Miranda, Daniel G.
Choudhury, Sutanay
Gorton, Ian
Gracio, Deborah K.
Halter, Todd D.
Jaitly, Navdeep D.
Johnson, John R.
Kouzes, Richard T.
Macduff, Matthew C.
Marquez, Andres
Monroe, Matthew E.
Oehmen, Christopher S.
Pike, William A.
Scherrer, Chad
Villa, Oreste
Webb-Robertson, Bobbie-Jo
Whitney, Paul D.
Zuljevic, Nino
[J]. ADVANCES IN COMPUTERS, VOL 79, 2010, 79 : 1 - 70
[6] Metacomputing and data-intensive applications
Messina, P
[J]. WORLDWIDE COMPUTING AND ITS APPLICATIONS, 1997, 1274 : 226 - 236
[7] Data replication techniques for data-intensive applications
No, Jaechun
Park, Chang Won
Park, Sung Soon
[J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 1063 - 1070
[8] Analysis of Big Data for Data-Intensive Applications
Dave, Meenu
Gianey, Hemant Kumar
[J]. 2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,
[9] Managing Data-Intensive Applications in the Cloud
Pei, Jian
[J]. COMPUTER, 2014, 47 (07) : 6 - 6
[10] Static Analysis of Data-Intensive Applications
Nagy, Csaba
[J]. PROCEEDINGS OF THE 17TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR 2013), 2013, : 435 - 438

← 1 2 3 4 5 →