The Cost of Privacy in Asynchronous Differentially-Private Machine Learning

被引:9
|
作者
Farokhi, Farhad [1 ]
Wu, Nan [2 ,3 ]
Smith, David [3 ,4 ]
Kaafar, Mohamed Ali [2 ,3 ]
机构
[1] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic 3010, Australia
[2] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia
[3] CSIROs Data61, Eveleigh, NSW 2015, Australia
[4] Australian Natl Univ, Coll Engn & Comp Sci CECS, Canberra, ACT 2600, Australia
关键词
Training; Data models; Distributed databases; Biological system modeling; Degradation; Privacy; Machine learning; differential privacy; stochastic gradient algorithm; asynchronous; ALGORITHMS;
D O I
10.1109/TIFS.2021.3050603
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider training machine learning models using data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. We consider differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.
引用
收藏
页码:2118 / 2129
页数:12
相关论文
共 50 条
  • [1] Distributed differentially-private learning with communication efficiency
    Phuong, Tran Thi
    Phong, Le Trieu
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 128
  • [2] Distributionally-robust machine learning using locally differentially-private data
    Farhad Farokhi
    [J]. Optimization Letters, 2022, 16 : 1167 - 1179
  • [3] Differentially-Private Learning of Low Dimensional Manifolds
    Choromanska, Anna
    Choromanski, Krzysztof
    Jagannathan, Geetha
    Monteleoni, Claire
    [J]. ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 249 - 263
  • [4] Matrix Gaussian Mechanisms for Differentially-Private Learning
    Yang, Jungang
    Xiang, Liyao
    Yu, Jiahao
    Wang, Xinbing
    Guo, Bin
    Li, Zhetao
    Li, Baochun
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (02) : 1036 - 1048
  • [5] Distributionally-robust machine learning using locally differentially-private data
    Farokhi, Farhad
    [J]. OPTIMIZATION LETTERS, 2022, 16 (04) : 1167 - 1179
  • [6] Gradient Sparsification Can Improve Performance of Differentially-Private Convex Machine Learning
    Farokhi, Farhad
    [J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 1695 - 1700
  • [7] Differentially-Private Deep Learning With Directional Noise
    Xiang, Liyao
    Li, Weiting
    Yang, Jungang
    Wang, Xinbing
    Li, Baochun
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 2599 - 2612
  • [8] Differentially-private learning of low dimensional manifolds
    Choromanska, Anna
    Choromanski, Krzysztof
    Jagannathan, Geetha
    Monteleoni, Claire
    [J]. THEORETICAL COMPUTER SCIENCE, 2016, 620 : 91 - 104
  • [9] Straggler-Resilient Differentially-Private Decentralized Learning
    Yakimenka, Yauhen
    Weng, Chung-Wei
    Lin, Hsuan-Yin
    Rosnes, Eirik
    Kliewer, Jorg
    [J]. 2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 708 - 713
  • [10] Differentially-Private Deep Learning from an Optimization Perspective
    Xiang, Liyao
    Yang, Jingbo
    Li, Baochun
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 559 - 567