Privacy-Preserving Machine Learning on Apache Spark

被引:1
|
作者
Brito, Claudia V. [1 ,2 ]
Ferreira, Pedro G. [1 ,3 ]
Portela, Bernardo L. [1 ,3 ]
Oliveira, Rui C. [1 ,2 ]
Paulo, Joao T. [1 ,2 ]
机构
[1] INESC TEC, P-4200465 Porto, Portugal
[2] Univ Minho, Dept Informat, P-4710057 Braga, Portugal
[3] Univ Porto, Fac Sci, P-4099002 Porto, Portugal
关键词
Cluster computing; Training; Machine learning; Hardware; Task analysis; Homomorphic encryption; Distributed computing; Trusted computing; Privacy-preserving; machine learning; distributed systems; apache spark; trusted execution environments; Intel SGX; SECURITY; ATTACKS;
D O I
10.1109/ACCESS.2023.3332222
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.
引用
收藏
页码:127907 / 127930
页数:24
相关论文
共 50 条
  • [21] Privacy-Preserving Machine Learning Using EtC Images
    Kawamura, Ayana
    Kinoshita, Yuma
    Kiya, Hitoshi
    [J]. INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2020, 2020, 11515
  • [22] Privacy-Preserving Distributed Machine Learning Made Faster
    Jiang, Zoe L.
    Gu, Jiajing
    Wang, Hongxiao
    Wu, Yulin
    Fang, Junbin
    Yiu, Siu-Ming
    Luo, Wenjian
    Wang, Xuan
    [J]. PROCEEDINGS OF THE INAUGURAL ASIACCS 2023 WORKSHOP ON SECURE AND TRUSTWORTHY DEEP LEARNING SYSTEMS, SECTL, 2022,
  • [23] SecureML: A System for Scalable Privacy-Preserving Machine Learning
    Mohassel, Payman
    Zhang, Yupeng
    [J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 19 - 38
  • [24] Re-visited Privacy-Preserving Machine Learning
    Miyaji, Atsuko
    Yamatsuki, Tatsuhiro
    He, Bingchang
    Yamashita, Shintaro
    Mimoto, Tomoaki
    [J]. 2023 20TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY AND TRUST, PST, 2023, : 298 - 307
  • [25] A Distributed Trust Framework for Privacy-Preserving Machine Learning
    Abramson, Will
    Hall, Adam James
    Papadopoulos, Pavlos
    Pitropakis, Nikolaos
    Buchanan, William J.
    [J]. TRUST, PRIVACY AND SECURITY IN DIGITAL BUSINESS, TRUSTBUS 2020, 2020, 12395 : 205 - 220
  • [26] Cryptographic Primitives in Privacy-Preserving Machine Learning: A Survey
    Qin, Hong
    He, Debiao
    Feng, Qi
    Khan, Muhammad Khurram
    Luo, Min
    Choo, Kim-Kwang Raymond
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 1919 - 1934
  • [27] Privacy-preserving machine learning with multiple data providers
    Li, Ping
    Li, Tong
    Ye, Heng
    Li, Jin
    Chen, Xiaofeng
    Xiang, Yang
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 341 - 350
  • [28] GENoPPML - a framework for genomic privacy-preserving machine learning
    Carpov, Sergiu
    Gama, Nicolas
    Georgieva, Mariya
    Jetchev, Dimitar
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2022), 2022, : 532 - 542
  • [29] Privacy-Preserving Machine Learning as a Service: Challenges and Opportunities
    Zhang, Qiao
    Xiang, Tao
    Cai, Yifei
    Zhao, Zhichao
    Wang, Ning
    Wu, Hongyi
    [J]. IEEE NETWORK, 2023, 37 (06): : 214 - 223
  • [30] Learning in the Dark: Privacy-Preserving Machine Learning using Function Approximation
    Khan, Tanveer
    Michalas, Antonis
    [J]. 2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 62 - 71