DHive: Query Execution Performance Analysis via Dataflow in Apache Hive

被引:0
|
作者
Zhang, Chaozu [1 ]
Shen, Qiaomu [2 ]
Tang, Bo [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
[2] Southern Univ Sci & Technol, Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 12期
关键词
D O I
10.14778/3611540.3611605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hive has been widely used for large-scale data analysis applications in many organizations. Various visual analytical tools are developed to help Hive users quickly analyze the query execution process and identify the performance bottleneck of executed queries. However, existing tools mostly focus on showing the time usage of query sub-components (jobs and operators) but fail to provide enough evidence to analyze the root reasons for the slow execution progress. To tackle this problem, we develop a visual analytical system DHive to visualize and analyze the query execution progress via dataflow analysis. DHive shows the dataflow during query execution at multiple levels: query level, job level and task level, which enable users to identify the key jobs/tasks and explain their time usage by linking them to the auxiliary information such as the system configuration and hardware status. We demonstrate the effectiveness of DHive by two cases in a production cluster. DHive is open-source at https://github.com/DBGroupSUSTech/DHive.git.
引用
收藏
页码:3998 / 4001
页数:4
相关论文
共 50 条
  • [1] Performance Analysis of ECG Big Data using Apache Hive and Apache Pig
    Ahmad, Mudassar
    Kanwal, Safina
    Cheema, Maryam
    Habib, Muhammad Asif
    2019 8TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICICT 2019), 2019, : 2 - 7
  • [2] Profiling Apache HIVE Query from Run Time Logs
    Haryono, Givanna Putri
    Zhou, Ying
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 61 - 68
  • [3] A Profiling Tool for Apache HIVE Run-time Query
    Kamath, Divya
    Srinivas, Praveen
    Gopal, Ashika
    Lanchana, B., V
    Suma, V
    2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 502 - 507
  • [4] GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing
    Liu, Haotian
    Tang, Bo
    Zhang, Jiashu
    Deng, Yangshen
    Yan, Xiao
    Zheng, Xinying
    Shen, Qiaomu
    Zeng, Dan
    Mao, Zunyao
    Zhang, Chaozu
    You, Zhengxin
    Wang, Zhihao
    Jiang, Runzhe
    Wang, Fang
    Yiu, Man Lung
    Li, Huan
    Han, Mingji
    Li, Qian
    Luo, Zhenghai
    PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 158 - 172
  • [5] GHive: A Demonstration of GPU-Accelerated Query Processing in Apache Hive
    Liu, Haotian
    Tang, Bo
    Zhang, Jiashu
    Deng, Yangshen
    Zheng, Xinying
    Shen, Qiaomu
    Yan, Xiao
    Zeng, Dan
    Mao, Zunyao
    Zhang, Chaozu
    You, Zhengxin
    Wang, Zhihao
    Jiang, Runzhe
    Wang, Fang
    Yiu, Man Lung
    Li, Huan
    Han, Mingji
    Li, Qian
    Luo, Zhenghai
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2417 - 2420
  • [6] Performance Analysis of MySQL Partition, Hive Partition-Bucketing and Apache Pig
    Kumar, A. Sunny
    2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
  • [7] Analysis of Apache Logs Using Hadoop and Hive
    Velinov, Aleksandar
    Zdravev, Zoran
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650
  • [8] Query Execution Time Analysis Using Apache Spark Framework for Big Data: A CRM Approach
    Yadav, Madan Lal
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (04)
  • [9] Apache Hive Performance Improvement Techniques for Relational Data
    Gunay, Melih
    Ince, M. Numan
    Cetinkaya, Alper
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [10] Query Execution Time Analysis Using Apache Spark Framework for Big Data: A CRM Approach
    Yadav, Madan Lal
    Journal of Information and Knowledge Management, 2022, 21 (04):