Profiling Apache HIVE Query from Run Time Logs

被引:0
|
作者
Haryono, Givanna Putri [1 ]
Zhou, Ying [1 ]
机构
[1] Univ Sydney, Sch Informat Technol, Sydney, NSW 2008, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Apache Hive is a widely used data warehousing and analysis tool. Developers write SQL like HIVE queries, which are converted into MapReduce programs to runs on a cluster. Despite its popularity, there is little research on performance comparison and diagnose. Part of the reason is that instrumentation techniques used to monitor execution can not be applied to intermediate MapReduce code generated from Hive query. Because the generated MapReduce code is hidden from developers, run time logs are the only places a developer can get a glimpse of the actual execution. Having an automatic tool to extract information and to generate report from logs is essential to understand the query execution behavior. We designed a tool to build the execution profile of individual Hive queries by extracting information from HIVE and Hadoop logs. The profile consists of detailed information about MapReduce jobs, tasks and attempts belonging to a query. It is stored as a JSON document in MongoDB and can be retrieved to generate reports in charts or tables. We have run several experiments on AWS with TPC-H data sets and queries to demonstrate that our profiling tool is able to assist developers in comparing HIVE queries written in different formats, running on different data sets and configured with different parameters. It is also able to compare tasks/attempts within the same job to diagnose performance issues.
引用
收藏
页码:61 / 68
页数:8
相关论文
共 50 条
  • [1] A Profiling Tool for Apache HIVE Run-time Query
    Kamath, Divya
    Srinivas, Praveen
    Gopal, Ashika
    Lanchana, B., V
    Suma, V
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 502 - 507
  • [2] Analysis of Apache Logs Using Hadoop and Hive
    Velinov, Aleksandar
    Zdravev, Zoran
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650
  • [3] DHive: Query Execution Performance Analysis via Dataflow in Apache Hive
    Zhang, Chaozu
    Shen, Qiaomu
    Tang, Bo
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3998 - 4001
  • [4] GHive: A Demonstration of GPU-Accelerated Query Processing in Apache Hive
    Liu, Haotian
    Tang, Bo
    Zhang, Jiashu
    Deng, Yangshen
    Zheng, Xinying
    Shen, Qiaomu
    Yan, Xiao
    Zeng, Dan
    Mao, Zunyao
    Zhang, Chaozu
    You, Zhengxin
    Wang, Zhihao
    Jiang, Runzhe
    Wang, Fang
    Yiu, Man Lung
    Li, Huan
    Han, Mingji
    Li, Qian
    Luo, Zhenghai
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2417 - 2420
  • [5] Using Apache portable run-time
    Bloom, R
    [J]. DR DOBBS JOURNAL, 2000, 25 (10): : 100 - +
  • [6] GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing
    Liu, Haotian
    Tang, Bo
    Zhang, Jiashu
    Deng, Yangshen
    Yan, Xiao
    Zheng, Xinying
    Shen, Qiaomu
    Zeng, Dan
    Mao, Zunyao
    Zhang, Chaozu
    You, Zhengxin
    Wang, Zhihao
    Jiang, Runzhe
    Wang, Fang
    Yiu, Man Lung
    Li, Huan
    Han, Mingji
    Li, Qian
    Luo, Zhenghai
    [J]. PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 158 - 172
  • [7] SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
    Yuan, Ding
    Mai, Haohui
    Xiong, Weiwei
    Tan, Lin
    Zhou, Yuanyuan
    Pasupathy, Shankar
    [J]. ASPLOS XV: FIFTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2010, : 143 - 154
  • [8] SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
    Yuan, Ding
    Mai, Haohui
    Xiong, Weiwei
    Tan, Lin
    Zhou, Yuanyuan
    Pasupathy, Shankar
    [J]. ACM SIGPLAN NOTICES, 2010, 45 (03) : 143 - 154
  • [9] Extracting Semantic Relations from Query Logs;
    Baeza-Yates, Ricardo
    Tiberi, Alessandro
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 76 - 85
  • [10] Mining Precision Interfaces From Query Logs
    Zhang, Qianrui
    Zhang, Haoci
    Sellam, Thibault
    Wu, Eugene
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 988 - 1005