Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive

被引：9

作者：

Garg, Varun ^{[1
]}

机构：

[1] GGITS, Dept Comp Sci & Engn, Jabalpur, India

来源：

2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN) | 2015年

关键词：

Hadoop; Hive; Multiple-query Optimization; Distributed Data Warehouse;

D O I：

10.1109/CICN.2015.184

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. The Hadoop MapReduce framework speed up the execution of queries. This manuscript proposes the use of Multi Query Optimization (MQO) technique to enhance the overall performance of Hadoop/Hive. During simultaneous execution of multiple queries, many opportunities can arise for distribution search and/or computation tasks. Executing common jobs only once can reduce the total execution time of all queries remarkably. Our framework, transforms a set of interrelated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times. It is experimentally shown that proposed Hive (Distributed Hive) outperforms the conventional Hive by 20-50% reduction, depending on the number of queries and percentage of shared tasks, in the total execution time of correlated TPC-H queries.

引用

页码：938 / 941

页数：4

共 50 条

[1] Big Data Analysis using Apache Hadoop
Manikandan, Shankar Ganesh
Ravi, Siddarth
[J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
[2] Analysis of Apache Logs Using Hadoop and Hive
Velinov, Aleksandar
Zdravev, Zoran
[J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650
[3] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
Ahmad, Mudassar
Kanwal, Safina
Cheema, Maryam
Habib, Muhammad Asif
[J]. 2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7
[4] Shared Disk Big Data Analytics with Apache Hadoop
Mukherjee, Anirban
Datta, Joydip
Jorapur, Raghavendra
Singhvi, Ravi
Haloi, Saurav
Akram, Wasim
[J]. 2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
[5] Big Data Optimization Using Hive
Neric, Vedrana
Sarajlic, Nermin
[J]. ELEKTROTEHNISKI VESTNIK, 2021, 88 (05): : 290 - 298
[6] Processing of Big Educational Data in the Cloud Using Apache Hadoop
Machova, Renata
Komarkova, Jitka
Lnenicka, Martin
[J]. INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
[7] CLUSTERING AND INDEXING OF MULTIPLE DOCUMENTS USING FEATURE EXTRACTION THROUGH APACHE HADOOP ON BIG DATA
Lydia, E. Laxmi
Moses, G. Jose
Varadarajan, Vijayakumar
Nonyelu, Fredi
Maseleno, Andino
Perumal, Eswaran
Shankar, K.
[J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, : 108 - 123
[8] Big Data Emerging Technologies: A CaseStudy with Analyzing Twitter Data using Apache Hive
Bhardwaj, Aditya
Vanraj
Kumar, Ankit
Narayan, Yogendra
Kumar, Pawan
[J]. 2015 2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ENGINEERING & COMPUTATIONAL SCIENCES (RAECS), 2015,
[9] Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses
Costa, Eduarda
Costa, Carlos
Santos, Maribel Yasmina
[J]. INFORMATION SYSTEMS, EMCIS 2017, 2017, 299 : 3 - 16
[10] Fast execution of RDF queries using Apache Hadoop
Mazumdar, Somnath
Scionti, Alberto
[J]. ADVANCES IN COMPUTERS, VOL 119, 2020, 119 : 1 - 33

← 1 2 3 4 5 →