Harnessing federated learning for anomaly detection in supercomputer nodes

被引:0
|
作者
Farooq, Emmen [1 ]
Milano, Michela [1 ]
Borghesi, Andrea [1 ]
机构
[1] Univ Bologna, DISI, Bologna, Italy
关键词
Federated learning; Anomaly detection; High-performance computing; Data center; Machine learning;
D O I
10.1016/j.future.2024.07.052
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
High-performance computing (HPC) systems are a crucial component of modern society, with a significant impact in areas ranging from economics to scientific research, thanks to their unrivaled computational capabilities. For this reason, the worldwide HPC installation is steeply trending upwards, with no sign of slowing down. However, these machines are both complex, comprising millions of heterogeneous components, hard to effectively manage, and very costly (both in terms of economic investment and of energy consumption). Therefore, maximizing their productivity is of paramount importance. For instance, anomalies and faults can generate significant downtime due to the difficulty of promptly detecting them, as there are potentially many sources of issues preventing the correct functioning of computing nodes. In recent years, several data-driven methods have been proposed to automatically detect anomalies in HPC systems, exploiting the fact that modern supercomputers are typically endowed with fine-grained monitoring infrastructures, collecting data that can be used to characterize the system behavior. Thus, it is possible to teach Machine Learning (ML) models to distinguish normal and anomalous states automatically. In this paper, we contribute to this line of research with a novel intuition, namely exploiting Federated Learning (FL) to improve the accuracy of anomaly detection models for HPC nodes. Although FL is not typically exploited in the HPC context, we show that FL can boost several types of underlying ML models, from supervised to unsupervised ones. We demonstrate our approach on a production Tier-0 supercomputer hosted in Italy. Applying FL to anomaly detection improves the average f-score from 0.46 to 0.87. Our research also shows FL can reduce the data collection time required to develop a representation data set, facilitating faster deployment of anomaly detection models. ML models need 5 months of training data for efficient anomaly detection performance while using FL reduces the training set by 15 times to 1.25 weeks.
引用
收藏
页码:673 / 685
页数:13
相关论文
共 50 条
  • [1] FADngs: Federated Learning for Anomaly Detection
    Dong, Boyu
    Chen, Dong
    Wu, Yu
    Tang, Siliang
    Zhuang, Yueting
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 2578 - 2592
  • [2] Anomaly Detection through Unsupervised Federated Learning
    Nardi, Mirko
    Valerio, Lorenzo
    Passarella, Andrea
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 495 - 501
  • [3] Federated Learning for Anomaly Detection in Vehicular Networks
    Tham, Chen-Khong
    Yang, Lu
    Khanna, Akshit
    Gera, Bhavya
    2023 IEEE 97TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-SPRING, 2023,
  • [4] Network Anomaly Detection Using Federated Learning
    Marfo, William
    Tosh, Deepak K.
    Moore, Shirley V.
    2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2022,
  • [5] A Federated Learning Approach to Anomaly Detection in Smart Buildings
    Sater, Raed Abdel
    Ben Hamza, A.
    ACM TRANSACTIONS ON INTERNET OF THINGS, 2021, 2 (04):
  • [6] Enhancing Robustness in Federated Learning by Supervised Anomaly Detection
    Quan, Pengrui
    Lee, Wei-Han
    Srivatsa, Mudhakar
    Srivastava, Mani
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 996 - 1003
  • [7] Enhancing IoT Anomaly Detection Performance for Federated Learning
    Weinger, Brett
    Kim, Jinoh
    Sim, Alex
    Nakashima, Makiya
    Moustafa, Nour
    Wu, K. John
    2020 16TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING (MSN 2020), 2020, : 206 - 213
  • [8] Enhancing IoT anomaly detection performance for federated learning
    Weinger, Brett
    Kim, Jinoh
    Sim, Alex
    Nakashima, Makiya
    Moustafa, Nour
    Wu, K. John
    DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (03) : 314 - 323
  • [9] Federated Learning for Anomaly Detection in Maritime Movement Data
    Graser, Anita
    Weissenfeld, Axel
    Heistracher, Clemens
    Dragaschnig, Melitta
    Widhalm, Peter
    PROCEEDINGS OF THE 2024 25TH IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT, MDM 2024, 2024, : 77 - 82
  • [10] Federated deep learning for anomaly detection in the internet of things
    Wang, Xiaofeng
    Wang, Yonghong
    Javaheri, Zahra
    Almutairi, Laila
    Moghadamnejad, Navid
    Younes, Osama S.
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108