Graph neural networks for detecting anomalies in scientific workflows

被引:1
|
作者
Jin, Hongwei [1 ,6 ]
Raghavan, Krishnan [1 ]
Papadimitriou, George [2 ]
Wang, Cong [3 ]
Mandal, Anirban [3 ]
Kiran, Mariam [4 ]
Deelman, Ewa [2 ]
Balaprakash, Prasanna [5 ]
机构
[1] Argonne Natl Lab, Lemont, IL USA
[2] Univ Southern Calif, Los Angeles, CA USA
[3] Renaissance Comp Inst RENCI, Chapel Hill, NC USA
[4] Energy Sci Network ESnet, Berkeley, CA USA
[5] Oak Ridge Natl Lab, Oak Ridge, TN USA
[6] Argonne Natl Lab, Math & Comp Sci Div, 9700 S Cass Ave, Lemont, IL 60439 USA
关键词
Anomaly detection; machine learning; graph neural networks; scientific workflows; hyperparameter tuning; explainable predictions;
D O I
10.1177/10943420231172140
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Identifying and addressing anomalies in complex, distributed systems can be challenging for reliable execution of scientific workflows. We model these workflows as directed acyclic graphs (DAGs), where the nodes and edges of the DAGs represent jobs and their dependencies, respectively. We develop graph neural networks (GNNs) to learn patterns in the DAGs and to detect anomalies at the node (job) and graph (workflow) levels. We investigate workflow-specific GNN models that are trained on a particular workflow and workflow-agnostic GNN models that are trained across the workflows. Our GNN models, which incorporate both individual job features and topological information from the workflow, show improved accuracy and efficiency compared to conventional learning methods for detecting anomalies. While joint trained with multiple scientific workflows, our GNN models reached an accuracy more than 80% for workflow level and 75% for job level anomalies. In addition, we illustrate the importance of hyperparameter tuning method in our study that can significantly improve the metric(s) measure of evaluating the GNN models. Finally, we integrate explainable GNN methods to provide insights on job features in the workflow that cause an anomaly.
引用
收藏
页码:394 / 411
页数:18
相关论文
共 50 条
  • [1] LogGD: Detecting Anomalies from System Logs with Graph Neural Networks
    Xie, Yongzheng
    Zhang, Hongyu
    Babar, Muhammad Ali
    [J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 299 - 310
  • [2] Detecting Anomalies in Cyber-Physical Systems Using Graph Neural Networks
    Vasil'eva, K. V.
    Lavrova, D. S.
    [J]. AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2021, 55 (08) : 1051 - 1060
  • [3] Detecting Anomalies in Cyber-Physical Systems Using Graph Neural Networks
    K. V. Vasil’eva
    D. S. Lavrova
    [J]. Automatic Control and Computer Sciences, 2021, 55 : 1051 - 1060
  • [4] Detecting performance anomalies in scientific workflows using hierarchical temporal memory
    Rodriguez, Maria A.
    Kotagiri, Ramamohanarao
    Buyya, Rajkumar
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 624 - 635
  • [5] Spatial-Temporal Graph Neural Network for Detecting and Localizing Anomalies in PMU Networks
    Behdadnia, Tohid
    Thoelen, Klaas
    Zobiri, Fairouz
    Deconinck, Geert
    [J]. DEPENDABLE COMPUTING-EDCC 2024 WORKSHOPS, SAFEAUTONOMY, TRUST IN BLOCKCHAIN, 2024, 2078 : 75 - 82
  • [6] Detecting Communication Anomalies in Tactical Networks via Graph Learning
    Vashist, Akshay
    Chadha, Ritu
    Kaplan, Michael
    Moeltner, Kimberly
    [J]. 2012 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2012), 2012,
  • [7] Detecting Anomalies in Online Social Networks using Graph Metrics
    Kaur, Ravneet
    Singh, Sarbjeet
    [J]. 2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [8] Detecting and Diagnosing Anomalies in Cellular Networks using Random Neural Networks
    Casas, Pedro
    D'Alconzo, Alessandro
    Fiadino, Pierdomenico
    Callegari, Christian
    [J]. 2016 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC), 2016, : 351 - 356
  • [9] Detecting and Categorizing Android Malware with Graph Neural Networks
    Xu, Peng
    Eckert, Claudia
    Zarras, Apostolis
    [J]. 36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 409 - 412
  • [10] Detecting Unseen Anomalies in Network Systems by Leveraging Neural Networks
    Hashemi, Mohammad J.
    Keller, Eric
    Tizpaz-Niari, Saeid
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (03): : 2515 - 2528