Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive

被引:9
|
作者
Garg, Varun [1 ]
机构
[1] GGITS, Dept Comp Sci & Engn, Jabalpur, India
关键词
Hadoop; Hive; Multiple-query Optimization; Distributed Data Warehouse;
D O I
10.1109/CICN.2015.184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. The Hadoop MapReduce framework speed up the execution of queries. This manuscript proposes the use of Multi Query Optimization (MQO) technique to enhance the overall performance of Hadoop/Hive. During simultaneous execution of multiple queries, many opportunities can arise for distribution search and/or computation tasks. Executing common jobs only once can reduce the total execution time of all queries remarkably. Our framework, transforms a set of interrelated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times. It is experimentally shown that proposed Hive (Distributed Hive) outperforms the conventional Hive by 20-50% reduction, depending on the number of queries and percentage of shared tasks, in the total execution time of correlated TPC-H queries.
引用
收藏
页码:938 / 941
页数:4
相关论文
共 50 条
  • [31] PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
    Kross, Johannes
    Krcmar, Helmut
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (03) : 1 - 24
  • [32] Optimization strategy of Hadoop small file storage for big data in healthcare
    Hui He
    Zhonghui Du
    Weizhe Zhang
    Allen Chen
    [J]. The Journal of Supercomputing, 2016, 72 : 3696 - 3707
  • [33] Optimization strategy of Hadoop small file storage for big data in healthcare
    He, Hui
    Du, Zhonghui
    Zhang, Weizhe
    Chen, Allen
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (10): : 3696 - 3707
  • [34] A Literature Review on Hadoop Ecosystem and Various Techniques of Big Data Optimization
    Singh, Vikash Kumar
    Taram, Manish
    Agrawal, Vinni
    Baghel, Bhartee Singh
    [J]. ADVANCES IN DATA AND INFORMATION SCIENCES, VOL 1, 2018, 38 : 231 - 240
  • [35] An Efficient Approach to Extract and Store Big Semantic Web Data Using Hadoop and Apache Spark GraphX
    Mohammed, Wria Mohammed Salih
    Maa, Alaa Khalil Ju
    [J]. ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2024, 13
  • [36] Inverted Indexing In Big Data Using Hadoop Multiple Node Cluster
    Velusamy, Kaushik
    Vijayaraju, Nivetha
    Venkitaramanan, Deepthi
    Suresh, Greeshma
    Madhu, Divya
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2013, 4 (11) : 156 - 161
  • [37] Analyzing Network Traffic Data Using Hive Queries
    Patel, Dharaben
    Yuan, Xiaohong
    Roy, Kaushik
    Abernathy, Aakiel
    [J]. SOUTHEASTCON 2017, 2017,
  • [38] Performance Analysis of Queries with Hive Optimized Data Models
    Sharma, Meghna
    Kaur, Jagdeep
    [J]. PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 687 - 698
  • [39] Multiple Queries Optimization for Data Streams on Cloud Computing
    Najib, Fatma M.
    Ismail, Rasha M.
    Badr, Nagwa L.
    Tolba, M. F.
    [J]. 2015 TENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2015, : 28 - 33
  • [40] GAGPC: Optimization of multiple continuous queries on data streams
    Suh, Young-Kyoon
    Son, Jin Hyun
    Kim, Myoung Ho
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON DATABASES AND APPLICATIONS, 2006, : 215 - +