VertiBayes: learning Bayesian network parameters from vertically partitioned data with missing values

被引:0
|
作者
van Daalen, Florian [1 ,2 ]
Ippel, Lianne [2 ]
Dekker, Andre [1 ]
Bermejo, Inigo [1 ]
机构
[1] Maastricht Univ, Med Ctr, GROW Sch Oncol & Reprod, Dept Radiat Oncol MAASTRO, Maastricht, Netherlands
[2] Stat Netherlands, Methodol, Heerlen, Netherlands
关键词
Federated Learning; Bayesian network; Privacy preserving; Vertically partitioned data; Parameter learning; Structure learning;
D O I
10.1007/s40747-024-01424-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated learning makes it possible to train a machine learning model on decentralized data. Bayesian networks are widely used probabilistic graphical models. While some research has been published on the federated learning of Bayesian networks, publications on Bayesian networks in a vertically partitioned data setting are limited, with important omissions, such as handling missing data. We propose a novel method called VertiBayes to train Bayesian networks (structure and parameters) on vertically partitioned data, which can handle missing values as well as an arbitrary number of parties. For structure learning we adapted the K2 algorithm with a privacy-preserving scalar product protocol. For parameter learning, we use a two-step approach: first, we learn an intermediate model using maximum likelihood, treating missing values as a special value, then we train a model on synthetic data generated by the intermediate model using the EM algorithm. The privacy guarantees of VertiBayes are equivalent to those provided by the privacy preserving scalar product protocol used. We experimentally show VertiBayes produces models comparable to those learnt using traditional algorithms. Finally, we propose two alternative approaches to estimate the performance of the model using vertically partitioned data and we show in experiments that these give accurate estimates.
引用
收藏
页码:5317 / 5329
页数:13
相关论文
共 50 条
  • [1] Learning Bayesian network Classi$ers from data with missing values
    Zhang, HW
    Lu, YC
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 35 - 38
  • [2] Data Missing Bayesian Network Parameters Learning Optimization Algorithm Based on EM
    Tong, Zhao-Jing
    Zhao, Yun-Ji
    Tan, Rui-Jun
    Shi, Jun-Ling
    [J]. 2016 INTERNATIONAL CONFERENCE ON SERVICE SCIENCE, TECHNOLOGY AND ENGINEERING (SSTE 2016), 2016, : 111 - 115
  • [3] LEARNING BAYESIAN NETWORK PARAMETERS FROM SOFT DATA
    Xiao, Xu Hong
    Lee, Hian Beng
    Ng, Gee Wah
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2009, 17 (02) : 281 - 294
  • [4] Partitioned hybrid learning of Bayesian network structures
    Jireh Huang
    Qing Zhou
    [J]. Machine Learning, 2022, 111 : 1695 - 1738
  • [5] Partitioned hybrid learning of Bayesian network structures
    Huang, Jireh
    Zhou, Qing
    [J]. MACHINE LEARNING, 2022, 111 (05) : 1695 - 1738
  • [6] A new algorithm for learning parameters of a Bayesian network from distributed data
    Chen, R
    Sivakumar, K
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 585 - 588
  • [7] Privacy-preserving computation of Bayesian networks on vertically partitioned data
    Yang, Zhiqiang
    Wright, Rebecca N.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (09) : 1253 - 1264
  • [8] Distributed prediction from vertically partitioned data
    Skillicorn, D. B.
    McConnell, S. M.
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (01) : 16 - 36
  • [9] Privacy-preserving of SVM over vertically partitioned with imputing missing data
    Omer, Mohammed Z.
    Gao, Hui
    Mustafa, Nadir
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (3-4) : 363 - 382
  • [10] Privacy-preserving of SVM over vertically partitioned with imputing missing data
    Mohammed Z. Omer
    Hui Gao
    Nadir Mustafa
    [J]. Distributed and Parallel Databases, 2017, 35 : 363 - 382