DHive: Query Execution Performance Analysis via Dataflow in Apache Hive

被引：0

作者：

Zhang, Chaozu ^{[1
]}

Shen, Qiaomu ^{[2
]}

Tang, Bo ^{[1
]}

机构：

[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China

[2] Southern Univ Sci & Technol, Res Inst Trustworthy Autonomous Syst, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2023年 / 16卷 / 12期

关键词：

D O I：

10.14778/3611540.3611605

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, Apache Hive has been widely used for large-scale data analysis applications in many organizations. Various visual analytical tools are developed to help Hive users quickly analyze the query execution process and identify the performance bottleneck of executed queries. However, existing tools mostly focus on showing the time usage of query sub-components (jobs and operators) but fail to provide enough evidence to analyze the root reasons for the slow execution progress. To tackle this problem, we develop a visual analytical system DHive to visualize and analyze the query execution progress via dataflow analysis. DHive shows the dataflow during query execution at multiple levels: query level, job level and task level, which enable users to identify the key jobs/tasks and explain their time usage by linking them to the auxiliary information such as the system configuration and hardware status. We demonstrate the effectiveness of DHive by two cases in a production cluster. DHive is open-source at https://github.com/DBGroupSUSTech/DHive.git.

引用

页码：3998 / 4001

页数：4

共 50 条

[21] Testing Database Systems via Differential Query Execution
Song, Jiansen
Dou, Wensheng
Cui, Ziyu
Dai, Qianwang
Wang, Wei
Wei, Jun
Zhong, Hua
Huang, Tao
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2072 - 2084
[22] Facilitating XML Query Processing Via Execution Plan
Izadi, Sayyed Kamyar
Garakani, Vahid
Haghjoo, Mostafa S.
ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 917 - 920
[23] Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics
Sethy, Rotsnarani
Dash, Santosh Kumar
Panda, Mrutyunjaya
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2016), 2018, 614 : 130 - 141
[24] Apache Spark and Apache Ignite Performance Analysis
Stan, Cristiana-Stefania
Pandelica, Adrian-Eduard
Zamfir, Vlad-Andrei
Stan, Roxana Gabriela
Negru, Catalin
2019 22ND INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE (CSCS), 2019, : 726 - 733
[25] LOW-POWER HETEROGENEOUS COMPUTING VIA ADAPTIVE EXECUTION OF DATAFLOW ACTORS
Boutellier, Jani
Bhattacharyya, Shuvra S.
2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2017,
[26] A multiple continuous query optimization method based on query execution pattern analysis
Watanabe, Y
Kitagawa, H
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 443 - 456
[27] Using Vectorized Execution to Improve SQL Query Performance on Spark
Shen, Yijie
Xiong, Jin
Jiang, Dejun
50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
[28] Use of dataflow-based execution to improve scalability and performance of coupled cluster codes
Windus, Theresa
Kowalski, Karol
Danalis, Anthony
Jagode, Heike
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
[29] Importance of Data Distribution on Hive-based Systems for Query Performance: An Experimental Study
Ciritoglu, Hilmi Egemen
Murphy, John
Thorpe, Christina
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 370 - 376
[30] Improving Query Execution Performance in Big Data using Cuckoo Filter
Mosharraf, Sharafat Ibn Mollah
Adnan, Muhammad Abdullah
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1079 - 1084

← 1 2 3 4 5 →