The Cost of Privacy in Asynchronous Differentially-Private Machine Learning

被引：9

作者：

Farokhi, Farhad ^{[1
]}

Wu, Nan ^{[2
,3
]}

Smith, David ^{[3
,4
]}

Kaafar, Mohamed Ali ^{[2
,3
]}

机构：

[1] Univ Melbourne, Dept Elect & Elect Engn, Melbourne, Vic 3010, Australia

[2] Macquarie Univ, Dept Comp, Sydney, NSW 2109, Australia

[3] CSIROs Data61, Eveleigh, NSW 2015, Australia

[4] Australian Natl Univ, Coll Engn & Comp Sci CECS, Canberra, ACT 2600, Australia

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2021年 / 16卷 / 16期

关键词：

Training; Data models; Distributed databases; Biological system modeling; Degradation; Privacy; Machine learning; differential privacy; stochastic gradient algorithm; asynchronous; ALGORITHMS;

D O I：

10.1109/TIFS.2021.3050603

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider training machine learning models using data located on multiple private and geographically-scattered servers with different privacy settings. Due to the distributed nature of the data, communicating with all collaborating private data owners simultaneously may prove challenging or altogether impossible. We consider differentially-private asynchronous algorithms for collaboratively training machine-learning models on multiple private datasets. The asynchronous nature of the algorithms implies that a central learner interacts with the private data owners one-on-one whenever they are available for communication without needing to aggregate query responses to construct gradients of the entire fitness function. Therefore, the algorithm efficiently scales to many data owners. We define the cost of privacy as the difference between the fitness of a privacy-preserving machine-learning model and the fitness of trained machine-learning model in the absence of privacy concerns. We demonstrate that the cost of privacy has an upper bound that is inversely proportional to the combined size of the training datasets squared and the sum of the privacy budgets squared. We validate the theoretical results with experiments on financial and medical datasets. The experiments illustrate that collaboration among more than 10 data owners with at least 10,000 records with privacy budgets greater than or equal to 1 results in a superior machine-learning model in comparison to a model trained in isolation on only one of the datasets, illustrating the value of collaboration and the cost of the privacy. The number of the collaborating datasets can be lowered if the privacy budget is higher.

引用

页码：2118 / 2129

页数：12

共 50 条

[1] Distributed differentially-private learning with communication efficiency
Phuong, Tran Thi
Phong, Le Trieu
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 128
[2] Distributionally-robust machine learning using locally differentially-private data
Farhad Farokhi
[J]. Optimization Letters, 2022, 16 : 1167 - 1179
[3] Differentially-Private Learning of Low Dimensional Manifolds
Choromanska, Anna
Choromanski, Krzysztof
Jagannathan, Geetha
Monteleoni, Claire
[J]. ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 249 - 263
[4] Matrix Gaussian Mechanisms for Differentially-Private Learning
Yang, Jungang
Xiang, Liyao
Yu, Jiahao
Wang, Xinbing
Guo, Bin
Li, Zhetao
Li, Baochun
[J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (02) : 1036 - 1048
[5] Distributionally-robust machine learning using locally differentially-private data
Farokhi, Farhad
[J]. OPTIMIZATION LETTERS, 2022, 16 (04) : 1167 - 1179
[6] Gradient Sparsification Can Improve Performance of Differentially-Private Convex Machine Learning
Farokhi, Farhad
[J]. 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 1695 - 1700
[7] Differentially-Private Deep Learning With Directional Noise
Xiang, Liyao
Li, Weiting
Yang, Jungang
Wang, Xinbing
Li, Baochun
[J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (05) : 2599 - 2612
[8] Differentially-private learning of low dimensional manifolds
Choromanska, Anna
Choromanski, Krzysztof
Jagannathan, Geetha
Monteleoni, Claire
[J]. THEORETICAL COMPUTER SCIENCE, 2016, 620 : 91 - 104
[9] Straggler-Resilient Differentially-Private Decentralized Learning
Yakimenka, Yauhen
Weng, Chung-Wei
Lin, Hsuan-Yin
Rosnes, Eirik
Kliewer, Jorg
[J]. 2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, : 708 - 713
[10] Differentially-Private Deep Learning from an Optimization Perspective
Xiang, Liyao
Yang, Jingbo
Li, Baochun
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 559 - 567

← 1 2 3 4 5 →