DHive: Query Execution Performance Analysis via Dataflow in Apache Hive

被引:0
|
作者
Zhang, Chaozu [1 ]
Shen, Qiaomu [2 ]
Tang, Bo [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
[2] Southern Univ Sci & Technol, Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 12期
关键词
D O I
10.14778/3611540.3611605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, Apache Hive has been widely used for large-scale data analysis applications in many organizations. Various visual analytical tools are developed to help Hive users quickly analyze the query execution process and identify the performance bottleneck of executed queries. However, existing tools mostly focus on showing the time usage of query sub-components (jobs and operators) but fail to provide enough evidence to analyze the root reasons for the slow execution progress. To tackle this problem, we develop a visual analytical system DHive to visualize and analyze the query execution progress via dataflow analysis. DHive shows the dataflow during query execution at multiple levels: query level, job level and task level, which enable users to identify the key jobs/tasks and explain their time usage by linking them to the auxiliary information such as the system configuration and hardware status. We demonstrate the effectiveness of DHive by two cases in a production cluster. DHive is open-source at https://github.com/DBGroupSUSTech/DHive.git.
引用
收藏
页码:3998 / 4001
页数:4
相关论文
共 50 条
  • [21] Testing Database Systems via Differential Query Execution
    Song, Jiansen
    Dou, Wensheng
    Cui, Ziyu
    Dai, Qianwang
    Wang, Wei
    Wei, Jun
    Zhong, Hua
    Huang, Tao
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2072 - 2084
  • [22] Facilitating XML Query Processing Via Execution Plan
    Izadi, Sayyed Kamyar
    Garakani, Vahid
    Haghjoo, Mostafa S.
    ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 917 - 920
  • [23] Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics
    Sethy, Rotsnarani
    Dash, Santosh Kumar
    Panda, Mrutyunjaya
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2016), 2018, 614 : 130 - 141
  • [24] Apache Spark and Apache Ignite Performance Analysis
    Stan, Cristiana-Stefania
    Pandelica, Adrian-Eduard
    Zamfir, Vlad-Andrei
    Stan, Roxana Gabriela
    Negru, Catalin
    2019 22ND INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE (CSCS), 2019, : 726 - 733
  • [25] LOW-POWER HETEROGENEOUS COMPUTING VIA ADAPTIVE EXECUTION OF DATAFLOW ACTORS
    Boutellier, Jani
    Bhattacharyya, Shuvra S.
    2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2017,
  • [26] A multiple continuous query optimization method based on query execution pattern analysis
    Watanabe, Y
    Kitagawa, H
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 443 - 456
  • [27] Using Vectorized Execution to Improve SQL Query Performance on Spark
    Shen, Yijie
    Xiong, Jin
    Jiang, Dejun
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [28] Use of dataflow-based execution to improve scalability and performance of coupled cluster codes
    Windus, Theresa
    Kowalski, Karol
    Danalis, Anthony
    Jagode, Heike
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
  • [29] Importance of Data Distribution on Hive-based Systems for Query Performance: An Experimental Study
    Ciritoglu, Hilmi Egemen
    Murphy, John
    Thorpe, Christina
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 370 - 376
  • [30] Improving Query Execution Performance in Big Data using Cuckoo Filter
    Mosharraf, Sharafat Ibn Mollah
    Adnan, Muhammad Abdullah
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1079 - 1084