Optimizing Interactive Development of Data-Intensive Applications

被引:10
|
作者
Interlandi, Matteo [1 ]
Tetali, Sai Deep [1 ,2 ]
Gulzar, Muhammad Ali [1 ]
Noor, Joseph [1 ]
Condie, Tyson [1 ]
Kim, Miryung [1 ]
Millstein, Todd [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Google Inc, Menlo Pk, CA USA
关键词
Query Rewriting; Incremental Evaluation; Spark; Interactive Development; Big Data;
D O I
10.1145/2987550.2987565
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. VEGA is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage VEGA to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.
引用
收藏
页码:510 / 522
页数:13
相关论文
共 50 条
  • [1] Transfer scheduling schemes for data-intensive, interactive applications
    Takizawa, Makoto
    Shimizu, Takashi
    Ishida, Osamu
    [J]. GLOBECOM 2007: 2007 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-11, 2007, : 2488 - 2491
  • [2] Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
    Ahmad, Maaz Bin Safeer
    Cheung, Alvin
    [J]. SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1675 - 1678
  • [3] Optimizing Distributed Data-Intensive Workflows
    Friese, Ryan D.
    Tallent, Nathan R.
    Schram, Malachi
    Halappanavar, Mahantesh
    Barker, Kevin J.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 279 - 289
  • [4] Model transformations in the development of data-intensive web applications
    Di Ruscio, D
    Pierantonio, A
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2005, 3520 : 475 - 490
  • [5] Applications in Data-Intensive Computing
    Shah, Anuj R.
    Adkins, Joshua N.
    Baxter, Douglas J.
    Cannon, William R.
    Chavarria-Miranda, Daniel G.
    Choudhury, Sutanay
    Gorton, Ian
    Gracio, Deborah K.
    Halter, Todd D.
    Jaitly, Navdeep D.
    Johnson, John R.
    Kouzes, Richard T.
    Macduff, Matthew C.
    Marquez, Andres
    Monroe, Matthew E.
    Oehmen, Christopher S.
    Pike, William A.
    Scherrer, Chad
    Villa, Oreste
    Webb-Robertson, Bobbie-Jo
    Whitney, Paul D.
    Zuljevic, Nino
    [J]. ADVANCES IN COMPUTERS, VOL 79, 2010, 79 : 1 - 70
  • [6] Metacomputing and data-intensive applications
    Messina, P
    [J]. WORLDWIDE COMPUTING AND ITS APPLICATIONS, 1997, 1274 : 226 - 236
  • [7] Data replication techniques for data-intensive applications
    No, Jaechun
    Park, Chang Won
    Park, Sung Soon
    [J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 1063 - 1070
  • [8] Analysis of Big Data for Data-Intensive Applications
    Dave, Meenu
    Gianey, Hemant Kumar
    [J]. 2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,
  • [9] Managing Data-Intensive Applications in the Cloud
    Pei, Jian
    [J]. COMPUTER, 2014, 47 (07) : 6 - 6
  • [10] Static Analysis of Data-Intensive Applications
    Nagy, Csaba
    [J]. PROCEEDINGS OF THE 17TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR 2013), 2013, : 435 - 438