Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive

被引:9
|
作者
Garg, Varun [1 ]
机构
[1] GGITS, Dept Comp Sci & Engn, Jabalpur, India
关键词
Hadoop; Hive; Multiple-query Optimization; Distributed Data Warehouse;
D O I
10.1109/CICN.2015.184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. The Hadoop MapReduce framework speed up the execution of queries. This manuscript proposes the use of Multi Query Optimization (MQO) technique to enhance the overall performance of Hadoop/Hive. During simultaneous execution of multiple queries, many opportunities can arise for distribution search and/or computation tasks. Executing common jobs only once can reduce the total execution time of all queries remarkably. Our framework, transforms a set of interrelated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times. It is experimentally shown that proposed Hive (Distributed Hive) outperforms the conventional Hive by 20-50% reduction, depending on the number of queries and percentage of shared tasks, in the total execution time of correlated TPC-H queries.
引用
收藏
页码:938 / 941
页数:4
相关论文
共 50 条
  • [41] Analyzing Network Traffic Data Using Hive Queries
    Patel, Dharaben
    Yuan, Xiaohong
    Roy, Kaushik
    Abernathy, Aakiel
    [J]. SOUTHEASTCON 2017, 2017,
  • [42] Performance Analysis of Queries with Hive Optimized Data Models
    Sharma, Meghna
    Kaur, Jagdeep
    [J]. PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 687 - 698
  • [43] Evaluation of Apache Hadoop for parallel data analysis with ROOT
    Lehrack, S.
    Duckeck, G.
    Ebke, J.
    [J]. 20TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2013), PARTS 1-6, 2014, 513
  • [44] Multiple Queries Optimization for Data Streams on Cloud Computing
    Najib, Fatma M.
    Ismail, Rasha M.
    Badr, Nagwa L.
    Tolba, M. F.
    [J]. 2015 TENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2015, : 28 - 33
  • [45] GAGPC: Optimization of multiple continuous queries on data streams
    Suh, Young-Kyoon
    Son, Jin Hyun
    Kim, Myoung Ho
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON DATABASES AND APPLICATIONS, 2006, : 215 - +
  • [46] An optimal framework for spatial query optimization using hadoop in big data analytics
    Dadheech, Pankaj
    Goyal, Dinesh
    Srivastava, Sumit
    Kumar, Ankit
    [J]. Recent Advances in Computer Science and Communications, 2020, 13 (06): : 1188 - 1198
  • [47] Performance optimization of computing task scheduling based on the Hadoop big data platform
    Li, Yang
    Hei, Xinhong
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022,
  • [48] Big data and Spark: Comparison with Hadoop
    Benlachmi, Yassine
    Hasnaoui, Moulay Lahcen
    [J]. PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 811 - 817
  • [49] Handling Big Data with Hadoop Toolkit
    Devakunchari, R.
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [50] Hadoop: Addressing Challenges of Big Data
    Singh, Kamalpreet
    Kaur, Ravinder
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 686 - 689