Graph neural networks for detecting anomalies in scientific workflows

被引：1

作者：

Jin, Hongwei ^{[1
,6
]}

Raghavan, Krishnan ^{[1
]}

Papadimitriou, George ^{[2
]}

Wang, Cong ^{[3
]}

Mandal, Anirban ^{[3
]}

Kiran, Mariam ^{[4
]}

Deelman, Ewa ^{[2
]}

Balaprakash, Prasanna ^{[5
]}

机构：

[1] Argonne Natl Lab, Lemont, IL USA

[2] Univ Southern Calif, Los Angeles, CA USA

[3] Renaissance Comp Inst RENCI, Chapel Hill, NC USA

[4] Energy Sci Network ESnet, Berkeley, CA USA

[5] Oak Ridge Natl Lab, Oak Ridge, TN USA

[6] Argonne Natl Lab, Math & Comp Sci Div, 9700 S Cass Ave, Lemont, IL 60439 USA

来源：

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS | 2023年 / 37卷 / 3-4期

关键词：

Anomaly detection; machine learning; graph neural networks; scientific workflows; hyperparameter tuning; explainable predictions;

D O I：

10.1177/10943420231172140

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Identifying and addressing anomalies in complex, distributed systems can be challenging for reliable execution of scientific workflows. We model these workflows as directed acyclic graphs (DAGs), where the nodes and edges of the DAGs represent jobs and their dependencies, respectively. We develop graph neural networks (GNNs) to learn patterns in the DAGs and to detect anomalies at the node (job) and graph (workflow) levels. We investigate workflow-specific GNN models that are trained on a particular workflow and workflow-agnostic GNN models that are trained across the workflows. Our GNN models, which incorporate both individual job features and topological information from the workflow, show improved accuracy and efficiency compared to conventional learning methods for detecting anomalies. While joint trained with multiple scientific workflows, our GNN models reached an accuracy more than 80% for workflow level and 75% for job level anomalies. In addition, we illustrate the importance of hyperparameter tuning method in our study that can significantly improve the metric(s) measure of evaluating the GNN models. Finally, we integrate explainable GNN methods to provide insights on job features in the workflow that cause an anomaly.

引用

页码：394 / 411

页数：18

共 50 条

[1] LogGD: Detecting Anomalies from System Logs with Graph Neural Networks
Xie, Yongzheng
Zhang, Hongyu
Babar, Muhammad Ali
[J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 299 - 310
[2] Detecting Anomalies in Cyber-Physical Systems Using Graph Neural Networks
Vasil'eva, K. V.
Lavrova, D. S.
[J]. AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2021, 55 (08) : 1051 - 1060
[3] Detecting Anomalies in Cyber-Physical Systems Using Graph Neural Networks
K. V. Vasil’eva
D. S. Lavrova
[J]. Automatic Control and Computer Sciences, 2021, 55 : 1051 - 1060
[4] Detecting performance anomalies in scientific workflows using hierarchical temporal memory
Rodriguez, Maria A.
Kotagiri, Ramamohanarao
Buyya, Rajkumar
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 624 - 635
[5] Spatial-Temporal Graph Neural Network for Detecting and Localizing Anomalies in PMU Networks
Behdadnia, Tohid
Thoelen, Klaas
Zobiri, Fairouz
Deconinck, Geert
[J]. DEPENDABLE COMPUTING-EDCC 2024 WORKSHOPS, SAFEAUTONOMY, TRUST IN BLOCKCHAIN, 2024, 2078 : 75 - 82
[6] Detecting Communication Anomalies in Tactical Networks via Graph Learning
Vashist, Akshay
Chadha, Ritu
Kaplan, Michael
Moeltner, Kimberly
[J]. 2012 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2012), 2012,
[7] Detecting Anomalies in Online Social Networks using Graph Metrics
Kaur, Ravneet
Singh, Sarbjeet
[J]. 2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
[8] Detecting and Diagnosing Anomalies in Cellular Networks using Random Neural Networks
Casas, Pedro
D'Alconzo, Alessandro
Fiadino, Pierdomenico
Callegari, Christian
[J]. 2016 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC), 2016, : 351 - 356
[9] Detecting and Categorizing Android Malware with Graph Neural Networks
Xu, Peng
Eckert, Claudia
Zarras, Apostolis
[J]. 36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 409 - 412
[10] Detecting Unseen Anomalies in Network Systems by Leveraging Neural Networks
Hashemi, Mohammad J.
Keller, Eric
Tizpaz-Niari, Saeid
[J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (03): : 2515 - 2528

← 1 2 3 4 5 →