Privacy-Preserving Machine Learning on Apache Spark

被引：1

作者：

Brito, Claudia V. ^{[1
,2
]}

Ferreira, Pedro G. ^{[1
,3
]}

Portela, Bernardo L. ^{[1
,3
]}

Oliveira, Rui C. ^{[1
,2
]}

Paulo, Joao T. ^{[1
,2
]}

机构：

[1] INESC TEC, P-4200465 Porto, Portugal

[2] Univ Minho, Dept Informat, P-4710057 Braga, Portugal

[3] Univ Porto, Fac Sci, P-4099002 Porto, Portugal

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Cluster computing; Training; Machine learning; Hardware; Task analysis; Homomorphic encryption; Distributed computing; Trusted computing; Privacy-preserving; machine learning; distributed systems; apache spark; trusted execution environments; Intel SGX; SECURITY; ATTACKS;

D O I：

10.1109/ACCESS.2023.3332222

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

引用

页码：127907 / 127930

页数：24

共 50 条

[1] Privacy-Preserving Machine Learning
Chow, Sherman S. M.
[J]. FRONTIERS IN CYBER SECURITY, 2018, 879 : 3 - 6
[2] Privacy-Preserving Machine Learning [Cryptography]
Kerschbaum, Florian
Lukas, Nils
[J]. IEEE SECURITY & PRIVACY, 2023, 21 (06) : 90 - 94
[3] Survey on Privacy-Preserving Machine Learning
Liu, Junxu
Meng, Xiaofeng
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (02): : 346 - 362
[4] Privacy-preserving machine learning with tensor networks
Pozas-Kerstjens, Alejandro
Hernandez-Santana, Senaida
Monturiol, Jose Ramon Pareja
Lopez, Marco Castrillon
Scarpa, Giannicola
Gonzalez-Guillen, Carlos E.
Perez-Garcia, David
[J]. QUANTUM, 2024, 8
[5] Challenges of Privacy-Preserving Machine Learning in IoT
Zheng, Mengyao
Xu, Dixing
Jiang, Linshan
Gu, Chaojie
Tan, Rui
Cheng, Peng
[J]. PROCEEDINGS OF THE 2019 INTERNATIONAL WORKSHOP ON CHALLENGES IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR INTERNET OF THINGS (AICHALLENGEIOT '19), 2019, : 1 - 7
[6] AN EXPLORATION OF FEDERATED LEARNING FOR PRIVACY-PRESERVING MACHINE LEARNING
Kumar, K. Kiran
Rao, Thalakola Syamsundara
Vullam, Nagagopiraju
Vellela, Sai Srinivas
Jyosthna, B.
Farjana, Shaik
Javvadi, Sravanthi
[J]. 2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
[7] Differential Privacy-preserving Distributed Machine Learning
Wang, Xin
Ishii, Hideaki
Du, Linkang
Cheng, Peng
Chen, Jiming
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 7339 - 7344
[8] Privacy-Preserving Machine Learning: Threats and Solutions
Al-Rubaie, Mohammad
Chang, J. Morris
[J]. IEEE SECURITY & PRIVACY, 2019, 17 (02) : 49 - 58
[9] A Review of Privacy-Preserving Machine Learning Classification
Wang, Andy
Wang, Chen
Bi, Meng
Xu, Jian
[J]. CLOUD COMPUTING AND SECURITY, PT IV, 2018, 11066 : 671 - 682
[10] Cryptographic Approaches for Privacy-Preserving Machine Learning
Jiang Han
Liu Yiran
Song Xiangfu
Wang Hao
Zheng Zhihua
Xu Qiuliang
[J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (05) : 1068 - 1078

← 1 2 3 4 5 →