MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish

被引:0
|
作者
Herodotou, Herodotos [1 ]
Dong, Fei [1 ]
Babu, Shivnath [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2011年 / 4卷 / 12期
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph and social network analysis, and computational science. However, MapReduce systems lack a feature that has been key to the historical success of database systems, namely, cost-based optimization. A major challenge here is that, to the MapReduce system, a program consists of black-box map and reduce functions written in some programming language like C++, Java, Python, or Ruby. Starfish is a self-tuning system for big data analytics that includes, to our knowledge, the first Cost-based Optimizer for simple to arbitrarily complex MapReduce programs. Starfish also includes a Profiler to collect detailed statistical information from unmodified MapReduce programs, and a What-if Engine for fine-grained cost estimation. This demonstration will present the profiling, what-if analysis, and cost-based optimization of MapReduce programs in Starfish. We will show how (nonexpert) users can employ the Starfish Visualizer to (a) get a deep understanding of a MapReduce program's behavior during execution, (b) ask hypothetical questions on how the program's behavior will change when parameter settings, cluster resources, or input data properties change, and (c) ultimately optimize the program.
引用
收藏
页码:1446 / 1449
页数:4
相关论文
共 50 条
  • [31] A worldwide cost-based design and optimization of tilted bifacial solar farms
    Patel, M. Tahir
    Khan, M. Ryyan
    Sun, Xingshu
    Alam, Muhammad A.
    [J]. APPLIED ENERGY, 2019, 247 : 467 - 479
  • [32] Elements of cost-based tolerancing
    Youngworth, RN
    Stone, BD
    [J]. OPTICAL REVIEW, 2001, 8 (04) : 276 - 280
  • [33] Early classification of time series Cost-based optimization criterion and algorithms
    Achenchabe, Youssef
    Bondu, Alexis
    Cornuejols, Antoine
    Dachraoui, Asma
    [J]. MACHINE LEARNING, 2021, 110 (06) : 1481 - 1504
  • [34] Task response time optimization using cost-based operation motion
    Tabbara, Bassam
    Tabbara, Abdallah
    Sangiovanni-Vincentelli, Alberto
    [J]. Hardware/Software Codesign - Proceedings of the International Workshop, 2000, : 110 - 114
  • [35] A Lifecycle Cost-based Design Optimization Model for Stormwater Management Systems
    Huang, Jinhui
    James, William
    Robert, W.
    James, C.
    [J]. JOURNAL OF WATER MANAGEMENT MODELING, 2005, : 53 - 70
  • [36] A statistics propagation approach to enable cost-based optimization of statement sequences
    Kraft, Tobias
    Schwarz, Holger
    Mitschang, Bernhard
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4690 : 267 - +
  • [37] Cost-based temporal reasoning
    Santos, Eugene, Jr.
    [J]. INFORMATION SCIENCES, 2019, 482 : 392 - 418
  • [38] A cost-based pricing analysis
    Katsigiannis, Michail
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON 5G FOR UBIQUITOUS CONNECTIVITY (5GU), 2014, : 264 - 266
  • [39] COST-BASED ACCEPTANCE SAMPLING
    CASE, KE
    BENNETT, GK
    SCHMIDT, JW
    [J]. INDUSTRIAL ENGINEERING, 1972, 4 (11): : 26 - &
  • [40] Cost-based Database Scaling
    Orugnati, V. S. Srujana
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 895 - 900