Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive

被引：9

作者：

Garg, Varun ^{[1
]}

机构：

[1] GGITS, Dept Comp Sci & Engn, Jabalpur, India

来源：

2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN) | 2015年

关键词：

Hadoop; Hive; Multiple-query Optimization; Distributed Data Warehouse;

D O I：

10.1109/CICN.2015.184

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. The Hadoop MapReduce framework speed up the execution of queries. This manuscript proposes the use of Multi Query Optimization (MQO) technique to enhance the overall performance of Hadoop/Hive. During simultaneous execution of multiple queries, many opportunities can arise for distribution search and/or computation tasks. Executing common jobs only once can reduce the total execution time of all queries remarkably. Our framework, transforms a set of interrelated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times. It is experimentally shown that proposed Hive (Distributed Hive) outperforms the conventional Hive by 20-50% reduction, depending on the number of queries and percentage of shared tasks, in the total execution time of correlated TPC-H queries.

引用

页码：938 / 941

页数：4

共 50 条

[31] PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
Kross, Johannes
Krcmar, Helmut
[J]. BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (03) : 1 - 24
[32] Optimization strategy of Hadoop small file storage for big data in healthcare
Hui He
Zhonghui Du
Weizhe Zhang
Allen Chen
[J]. The Journal of Supercomputing, 2016, 72 : 3696 - 3707
[33] Optimization strategy of Hadoop small file storage for big data in healthcare
He, Hui
Du, Zhonghui
Zhang, Weizhe
Chen, Allen
[J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (10): : 3696 - 3707
[34] A Literature Review on Hadoop Ecosystem and Various Techniques of Big Data Optimization
Singh, Vikash Kumar
Taram, Manish
Agrawal, Vinni
Baghel, Bhartee Singh
[J]. ADVANCES IN DATA AND INFORMATION SCIENCES, VOL 1, 2018, 38 : 231 - 240
[35] An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX
Mohammed, Wria Mohammed Salih
Maa, Alaa Khalil Ju
[J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2024, 13
[36] Inverted Indexing In Big Data Using Hadoop Multiple Node Cluster
Velusamy, Kaushik
Vijayaraju, Nivetha
Venkitaramanan, Deepthi
Suresh, Greeshma
Madhu, Divya
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2013, 4 (11) : 156 - 161
[37] Analyzing Network Traffic Data Using Hive Queries
Patel, Dharaben
Yuan, Xiaohong
Roy, Kaushik
Abernathy, Aakiel
[J]. SOUTHEASTCON 2017, 2017,
[38] Performance Analysis of Queries with Hive Optimized Data Models
Sharma, Meghna
Kaur, Jagdeep
[J]. PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 687 - 698
[39] Multiple Queries Optimization for Data Streams on Cloud Computing
Najib, Fatma M.
Ismail, Rasha M.
Badr, Nagwa L.
Tolba, M. F.
[J]. 2015 TENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2015, : 28 - 33
[40] GAGPC: Optimization of multiple continuous queries on data streams
Suh, Young-Kyoon
Son, Jin Hyun
Kim, Myoung Ho
[J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON DATABASES AND APPLICATIONS, 2006, : 215 - +

← 1 2 3 4 5 →