Performance Analysis of Queries with Hive Optimized Data Models

被引:2
|
作者
Sharma, Meghna [1 ]
Kaur, Jagdeep [1 ]
机构
[1] NorthCap Univ, Gurugram, Haryana, India
关键词
Big Data; Hadoop; Hive; Partitioning; Bucket methods;
D O I
10.1007/978-3-030-29407-6_49
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.
引用
收藏
页码:687 / 698
页数:12
相关论文
共 50 条
  • [1] Analyzing Network Traffic Data Using Hive Queries
    Patel, Dharaben
    Yuan, Xiaohong
    Roy, Kaushik
    Abernathy, Aakiel
    [J]. SOUTHEASTCON 2017, 2017,
  • [2] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [3] EStore: An Effective Optimized Data Placement Structure for Hive
    Li, Xin
    Li, Hui
    Huang, Zhihao
    Zhu, Bing
    Cai, Jiawei
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2996 - 3001
  • [4] Performance analysis of parallelization models for path expression queries
    Taniar, D
    Rahayu, JW
    [J]. INFORMATION SCIENCES, 1999, 117 (1-2) : 107 - 142
  • [5] A Performance Evaluation of Hive for Scientific Data Management
    Liu, Taoying
    Liu, Jing
    Liu, Hong
    Li, Wei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [6] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
    Ahmad, Mudassar
    Kanwal, Safina
    Cheema, Maryam
    Habib, Muhammad Asif
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7
  • [7] Big Data: Performance Profiling of Meteorological and Oceanographic Data on Hive
    Abdullahi, Ali Usman
    Ahmad, Rohiza
    Zakaria, Nordin M.
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCOINS), 2016, : 203 - 208
  • [8] Big Data Analytics: Exploring Graphs with Optimized SQL Queries
    Al-Amin, Sikder Tahsin
    Ordonez, Carlos
    Bellatreche, Ladjel
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS: DEXA 2018 INTERNATIONAL WORKSHOPS, 2018, 903 : 88 - 100
  • [9] A STUDY OF THE EFFECT OF DIFFERENT DATA MODELS ON CASUAL USERS PERFORMANCE IN WRITING DATABASE QUERIES
    RAY, HN
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1985, 23 (03): : 249 - 262
  • [10] Apache Hive Performance Improvement Techniques for Relational Data
    Gunay, Melih
    Ince, M. Numan
    Cetinkaya, Alper
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,