Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

被引:30
|
作者
Camacho-Rodriguez, Jesus [1 ]
Chauhan, Ashutosh [1 ]
Gates, Alan [1 ]
Koifman, Eugene [1 ]
O'Malley, Owen [1 ]
Garg, Vineet [1 ]
Haindrich, Zoltan [1 ]
Shelukhin, Sergey [1 ]
Jayachandran, Prasanth [1 ]
Seth, Siddharth [1 ]
Jaiswal, Deepak [1 ]
Bouguerra, Slim [1 ]
Bangarwa, Nishant [1 ]
Hariappan, Sankar [1 ]
Agarwal, Anishek [1 ]
Dere, Jason [1 ]
Dai, Daniel [1 ]
Nair, Thejas [1 ]
Dembla, Nita [1 ]
Vijayaraghavan, Gopal [1 ]
Hagleitner, Guenther [1 ]
机构
[1] Hortonworks Inc, Santa Clara, CA 95054 USA
关键词
Databases; Data Warehouses; Hadoop; Hive;
D O I
10.1145/3299869.3314045
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Apache Hive is an open-source relational database system for analytic big-data workloads. In this paperwe describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid architecture that combines traditional MPP techniques with more recent big data and cloud concepts to achieve the scale and performance required by today's analytic applications. We explore the system by detailing enhancements along four main axis: Transactions, optimizer, runtime, and federation. We then provide experimental results to demonstrate the performance of the system for typical workloads and conclude with a look at the community roadmap.
引用
收藏
页码:1773 / 1786
页数:14
相关论文
共 38 条
  • [1] MapReduce Research on Warehousing of Big Data
    Pticek, M.
    Vrdoljak, B.
    [J]. 2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 1361 - 1366
  • [2] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
    Ahmad, Mudassar
    Kanwal, Safina
    Cheema, Maryam
    Habib, Muhammad Asif
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7
  • [3] Implementation Patterns for Zone Architectures in Enterprise-Grade Data Lakes
    Giebler, Corinna
    Groeger, Christoph
    Hoos, Eva
    Schwarz, Holger
    Mitschang, Bernhard
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2024, 2024, 14663 : 267 - 283
  • [4] A Zone Reference Model for Enterprise-Grade Data Lake Management
    Giebler, Corinna
    Groger, Christoph
    Hoos, Eva
    Schwarz, Holger
    Mitschang, Bernhard
    [J]. 2020 IEEE 24TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2020), 2020, : 57 - 66
  • [5] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [6] Data Caching for Enterprise-Grade Petabyte-Scale OLAP
    Tang, Chunxu
    Fan, Bin
    Zhao, Jing
    Liang, Chen
    Wang, Yi
    Wang, Beinan
    Qiu, Ziyue
    Qiu, Lu
    Ding, Bowen
    Sun, Shouzhuo
    Che, Saiguang
    Mai, Jiaming
    Chen, Shouwei
    Zhu, Yu
    Xie, Jianjian
    Sun, Yutian
    Li, Yao
    Zhang, Yangjun
    Wang, Ke
    Chen, Mingmin
    [J]. PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 901 - 915
  • [7] Big Data Emerging Technologies: A CaseStudy with Analyzing Twitter Data using Apache Hive
    Bhardwaj, Aditya
    Vanraj
    Kumar, Ankit
    Narayan, Yogendra
    Kumar, Pawan
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ENGINEERING & COMPUTATIONAL SCIENCES (RAECS), 2015,
  • [8] Theoretical and Empirical Analysis of Usage of MapReduce and Apache Tez in Big Data
    Singh, Rupinder
    Kaur, Puneet Jai
    [J]. PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 2, 2016, 51 : 529 - 536
  • [9] Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
    Eduarda Costa
    Carlos Costa
    Maribel Yasmina Santos
    [J]. Journal of Big Data, 6
  • [10] Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems
    Costa, Eduarda
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)