BigBench V2: The New and Improved BigBench

被引:18
|
作者
Ghazal, Ahmad [1 ]
Ivanov, Todor [2 ]
Kostamaa, Pekka [3 ]
Crolotte, Alain [4 ]
Voong, Ryan [5 ]
Al-Kateb, Mohammed [4 ]
Ghazal, Waleed [6 ]
Zicari, Roberto V. [2 ]
机构
[1] Futurewei Technol Inc, Santa Clara, CA 95050 USA
[2] Goethe Univ Frankfurt, Frankfurt Big Data Lab, Frankfurt, Germany
[3] OpenX, Culver City, CA USA
[4] Teradata Labs, El Segundo, CA USA
[5] Univ Calif Los Angeles, Los Angeles, CA USA
[6] Redondo Union High Sch, Redondo Beach, CA USA
关键词
D O I
10.1109/ICDE.2017.167
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Benchmarking Big Data solutions has been gaining a lot of attention from research and industry. BigBench is one of the most popular benchmarks in this area which was adopted by the TPC as TPCx-BB. BigBench, however, has key shortcomings. The structured component of the data model is the same as the TPC-DS data model which is a complex snowflake-like schema. This is contrary to the simple star schema Big Data models in real life. BigBench also treats the semi-structured web-logs more or less as a structured table. In real life, web-logs are modeled as key-value pairs with unknown schema. Specific keys are captured at query time - a process referred to as late binding. In addition, eleven (out of thirty) of the BigBench queries are TPC-DS queries. These queries are complex SQL applied on the structured part of the data model which again is not typical of Big Data workloads. In this paper(1), we present BigBench V2 to address the aforementioned limitations of the original BigBench. BigBench V2 is completely independent of TPC-DS with a new data model and an overhauled workload. The new data model has a simple structured data model. Web-logs are modeled as key-value pairs with a substantial and variable number of keys. BigBench V2 mandates late binding by requiring query processing to be done directly on key-value web-logs rather than a pre-parsed form of it. A new scale factor-based data generator is implemented to produce structured tables, key-value semi-structured web-logs, and unstructured data. We implemented and executed BigBench V2 on Hive. Our proof of concept shows the feasibility of BigBench V2 and outlines different ways of implementing late binding.
引用
收藏
页码:1225 / 1236
页数:12
相关论文
共 50 条
  • [1] BigBench Specification V0.1 BigBench: An Industry Standard Benchmark for Big Data Analytics
    Rabl, Tilmann
    Ghazal, Ahmad
    Hu, Minqing
    Crolotte, Alain
    Raab, Francois
    Poess, Meikel
    Jacobsen, Hans-Arno
    SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 164 - 201
  • [2] Adding Velocity to BigBench
    Ivanov, Todor
    Bedue, Patrick
    Ghazal, Ahmad
    Zicari, Roberto, V
    DBTEST'18: PROCEEDINGS OF THE WORKSHOP ON TESTING DATABASE SYSTEMS, 2018,
  • [3] Towards a Complete BigBench Implementation
    Rabl, Tilmann
    Frank, Michael
    Danisch, Manuel
    Gowda, Bhaskar
    Jacobsen, Hans-Arno
    BIG DATA BENCHMARKING, WBDB 2014, 2015, 8991 : 3 - 11
  • [4] BigBench workload executed by using Apache Flink
    Bergamaschi, Sonia
    Gagliardelli, Luca
    Simonini, Giovanni
    Zhu, Song
    27TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING, FAIM2017, 2017, 11 : 695 - 702
  • [5] Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data
    Baru, Chaitanya
    Bhandarkar, Milind
    Curino, Carlo
    Danisch, Manuel
    Frank, Michael
    Gowda, Bhaskar
    Jacobsen, Hans-Arno
    Jie, Huang
    Kumar, Dileep
    Nambiar, Raghunath
    Poess, Meikel
    Raab, Francois
    Rabl, Tilmann
    Ravi, Nishkam
    Sachs, Kai
    Sen, Saptak
    Yi, Lan
    Youn, Choonhan
    PERFORMANCE CHARACTERIZATION AND BENCHMARKING: TRADITIONAL TO BIG DATA, 2015, 8904 : 44 - 63
  • [6] ARES v2: new features and improved performance
    Sousa, S. G.
    Santos, N. C.
    Adibekyan, V.
    Delgado-Mena, E.
    Israelian, G.
    ASTRONOMY & ASTROPHYSICS, 2015, 577
  • [7] ARES v2: New features and improved performance
    Sousa, S.G.
    Santos, N.C.
    Adibekyan, V.
    Delgado-Mena, E.
    Israelian, G.
    Astronomy and Astrophysics, 2015, 577
  • [8] From BigBench to TPCx-BB: Standardization of a Big Data Benchmark
    Cao, Paul
    Gowda, Bhaskar
    Lakshmi, Seetha
    Narasimhadevara, Chinmayi
    Nguyen, Patrick
    Poelman, John
    Poess, Meikel
    Rabl, Tilmann
    PERFORMANCE EVALUATION AND BENCHMARKING: TRADITIONAL - BIG DATA - INTERNET OF THINGS, TPCTC 2016, 2017, 10080 : 24 - 44
  • [9] Amdahl's Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench)
    Richins, Daniel
    Ahmed, Tahrina
    Clapp, Russell
    Reddi, Vijay Janapa
    2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, : 630 - 642
  • [10] Full V2, no V2, residual V2: Exploring variation through phases
    Klaevik-Pettersen, Espen
    ISOGLOSS OPEN JOURNAL OF ROMANCE LINGUISTICS, 2022, 8 (03): : 33 - 33