Towards Unsupervised Sudden Data Drift Detection in Federated Learning with Fuzzy Clustering

被引:0
|
作者
Stallmann, Morris [1 ]
Wilbik, Anna [1 ]
Weiss, Gerhard [1 ]
机构
[1] Maastricht Univ, Dept Adv Comp Sci, Maastricht, Netherlands
关键词
federated learning; fuzzy clustering; unsupervised; drift; drift detection; federated drift detection; federated data drift detection; FCM;
D O I
10.1109/FUZZ-IEEE60900.2024.10611883
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning (FL) is a machine learning (ML) discipline that allows to train ML models on distributed data without revealing raw data instances. It promises to enable ML in environments with data sharing constraints, e.g., due to data privacy concerns, or other considerations. Data and concept drift are commonly referred to as unpredictable changes in data distributions over time. It is known to impact a ML model's performances in many real-world scenarios. While drift detection and adaptation has been studied extensively in the non-federated setting, it is still less explored in the FL setting. The private and distributed nature of data in FL makes drift detection much harder in FL since no entity can oversee all data instances to estimate changes in the global data distribution. In this paper, we propose a novel unsupervised federated data drift detection method that is based on federated fuzzy c-means clustering and the federated fuzzy Davies-Bouldin index, a global cluster validation metric. First, using the federated fuzzy c-means clustering algorithm, an initial global data model is learned. Second, the federated fuzzy Davies-Bouldin index . is calculated estimating how well the data fits the learned model. Third, whenever a new batch of data is available at time t, the fit of initial data model and new data is evaluated through the federated fuzzy Davies-Bouldin index Delta(t). Finally Delta and Delta(t) are compared to detect drift. The method is unsupervised as it does not require any labels and detects global data drift while keeping all data private. We evaluate our method carefully in a controlled environment by simulating multiple federated drift scenarios. We observe promising results as it rarely signals false positive alarms and detects drift in multiple scenarios. We also observe short-comings such as sensitivity to parameter choices and low detection rate in case only few data points in a new batch of data are affected by drift.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Robust Federated Learning Based on Metrics Learning and Unsupervised Clustering for Malicious Data Detection
    Li, Jiaming
    Zhang, Xinyue
    Zhao, Liang
    ACMSE 2022: PROCEEDINGS OF THE 2022 ACM SOUTHEAST CONFERENCE, 2022, : 238 - 242
  • [2] Towards Federated Unsupervised Representation Learning
    van Berlo, Bram
    Saeed, Aaqib
    Ozcelebi, Tanir
    PROCEEDINGS OF THE THIRD ACM INTERNATIONAL WORKSHOP ON EDGE SYSTEMS, ANALYTICS AND NETWORKING (EDGESYS'20), 2020, : 31 - 36
  • [3] On learning guarantees to unsupervised concept drift detection on data streams
    de Mello, Rodrigo F.
    Vaz, Yule
    Grossi, Carlos H.
    Bifet, Albert
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 90 - 102
  • [4] Unsupervised Federated Learning for Unbalanced Data
    Servetnyk, Mykola
    Fung, Carrson C.
    Han, Zhu
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [5] UNSUPERVISED LEARNING ALGORITHM FOR FUZZY CLUSTERING
    URAHAMA, K
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 390 - 391
  • [6] Towards a Federated Fuzzy Learning System
    Wilbik, Anna
    Grefen, Paul
    IEEE CIS INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS 2021 (FUZZ-IEEE), 2021,
  • [7] Anomaly Detection through Unsupervised Federated Learning
    Nardi, Mirko
    Valerio, Lorenzo
    Passarella, Andrea
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 495 - 501
  • [8] Federated Fuzzy Clustering for Longitudinal Health Data
    Balkus, Salvador V.
    Fang, Hua
    Wang, Honggang
    2022 IEEE/ACM CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES (CHASE 2022), 2022, : 128 - 132
  • [9] Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering
    Lubana, Ekdeep Singh
    Tang, Chi Ian
    Kawsar, Fahim
    Dick, Robert P.
    Mathur, Akhil
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] On Fuzzy Clustering of Data Streams with Concept Drift
    Jaworski, Maciej
    Duda, Piotr
    Pietruczuk, Lena
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 82 - 91