Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive

被引:9
|
作者
Garg, Varun [1 ]
机构
[1] GGITS, Dept Comp Sci & Engn, Jabalpur, India
关键词
Hadoop; Hive; Multiple-query Optimization; Distributed Data Warehouse;
D O I
10.1109/CICN.2015.184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. The Hadoop MapReduce framework speed up the execution of queries. This manuscript proposes the use of Multi Query Optimization (MQO) technique to enhance the overall performance of Hadoop/Hive. During simultaneous execution of multiple queries, many opportunities can arise for distribution search and/or computation tasks. Executing common jobs only once can reduce the total execution time of all queries remarkably. Our framework, transforms a set of interrelated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times. It is experimentally shown that proposed Hive (Distributed Hive) outperforms the conventional Hive by 20-50% reduction, depending on the number of queries and percentage of shared tasks, in the total execution time of correlated TPC-H queries.
引用
收藏
页码:938 / 941
页数:4
相关论文
共 50 条
  • [1] Big Data Analysis using Apache Hadoop
    Manikandan, Shankar Ganesh
    Ravi, Siddarth
    [J]. 2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
  • [2] Analysis of Apache Logs Using Hadoop and Hive
    Velinov, Aleksandar
    Zdravev, Zoran
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650
  • [3] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
    Ahmad, Mudassar
    Kanwal, Safina
    Cheema, Maryam
    Habib, Muhammad Asif
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7
  • [4] Shared Disk Big Data Analytics with Apache Hadoop
    Mukherjee, Anirban
    Datta, Joydip
    Jorapur, Raghavendra
    Singhvi, Ravi
    Haloi, Saurav
    Akram, Wasim
    [J]. 2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
  • [5] Big Data Optimization Using Hive
    Neric, Vedrana
    Sarajlic, Nermin
    [J]. ELEKTROTEHNISKI VESTNIK, 2021, 88 (05): : 290 - 298
  • [6] Processing of Big Educational Data in the Cloud Using Apache Hadoop
    Machova, Renata
    Komarkova, Jitka
    Lnenicka, Martin
    [J]. INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
  • [7] CLUSTERING AND INDEXING OF MULTIPLE DOCUMENTS USING FEATURE EXTRACTION THROUGH APACHE HADOOP ON BIG DATA
    Lydia, E. Laxmi
    Moses, G. Jose
    Varadarajan, Vijayakumar
    Nonyelu, Fredi
    Maseleno, Andino
    Perumal, Eswaran
    Shankar, K.
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, : 108 - 123
  • [8] Big Data Emerging Technologies: A CaseStudy with Analyzing Twitter Data using Apache Hive
    Bhardwaj, Aditya
    Vanraj
    Kumar, Ankit
    Narayan, Yogendra
    Kumar, Pawan
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ENGINEERING & COMPUTATIONAL SCIENCES (RAECS), 2015,
  • [9] Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses
    Costa, Eduarda
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. INFORMATION SYSTEMS, EMCIS 2017, 2017, 299 : 3 - 16
  • [10] Fast execution of RDF queries using Apache Hadoop
    Mazumdar, Somnath
    Scionti, Alberto
    [J]. ADVANCES IN COMPUTERS, VOL 119, 2020, 119 : 1 - 33