Secure principal component analysis in multiple distributed nodes

被引:6
|
作者
Won, Hee-Sun [1 ]
Kim, Sang-Pil [2 ]
Lee, Sanghun [2 ]
Choi, Mi-Jung [2 ]
Moon, Yang-Sae [2 ]
机构
[1] Elect & Telecommun Res Inst, 218 Gajeong Ro, Taejon 305701, South Korea
[2] Kangwon Natl Univ, Dept Comp Sci, 1 Kangwondaehak Gil, Chuncheon Si 200701, Gangwon, South Korea
基金
新加坡国家研究基金会;
关键词
privacy-preserving data mining; secure principal component analysis; secure multiparty computation; secure similar document detection; EUCLIDEAN DISTANCE; MINING ALGORITHMS;
D O I
10.1002/sec.1501
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy preservation becomes an important issue in recent big data analysis, and many secure multiparty computations have been proposed for the purpose of privacy preservation in the environment of distributed nodes. As a secure multiparty computations of principal component analysis (PCA), in this paper, we propose S-PCA, which compute PCA securely among the distributed nodes. PCA is widely used in many applications including time-series analysis, text mining, and image compression. In general, we compute PCA after concentrating all data in a single server, but this approach discloses data privacy of each node. In contrast, the proposed S-PCA computes PCA without disclosing the sensitive data of individual nodes. In S-PCA, the nodes share non-sensitive mean vectors first and compute covariance matrices and PCA securely using the shared mean vectors. In this paper, we formally prove the correctness and secureness of S-PCA and apply it to an application of secure similar document detection. Experimental results show that the performance of S-PCA is slightly worse than that of PCA due to guarantee of secureness, but it significantly improves the performance of secure similar document detection by up to two orders of magnitudes. Copyright (c) 2016 John Wiley & Sons, Ltd.
引用
收藏
页码:2348 / 2358
页数:11
相关论文
共 50 条
  • [1] Improved Distributed Principal Component Analysis
    Balcan, Maria-Florina
    Kanchanapally, Vandana
    Liang, Yingyu
    Woodruff, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [2] DIFFERENTIALLY PRIVATE DISTRIBUTED PRINCIPAL COMPONENT ANALYSIS
    Imtiaz, Hafiz
    Sarwate, Anand D.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2206 - 2210
  • [3] Distributed Principal Component Analysis with Limited Communication
    Alimisis, Foivos
    Davies, Peter
    Vandereycken, Bart
    Alistarh, Dan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] A Review of Distributed Algorithms for Principal Component Analysis
    Wu, Sissi Xiaoxiao
    Wai, Hoi-To
    Li, Lin
    Scaglione, Anna
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (08) : 1321 - 1340
  • [5] Multiple imputation in principal component analysis
    Julie Josse
    Jérôme Pagès
    François Husson
    [J]. Advances in Data Analysis and Classification, 2011, 5 : 231 - 246
  • [6] Multiple group principal component analysis
    Richard A. Reyment
    [J]. Mathematical Geology, 1997, 29 : 1 - 16
  • [7] Multiple group principal component analysis
    Reyment, RA
    [J]. MATHEMATICAL GEOLOGY, 1997, 29 (01): : 1 - 16
  • [8] Multiple imputation in principal component analysis
    Josse, Julie
    Pages, Jerome
    Husson, Francois
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (03) : 231 - 246
  • [9] Distributed Estimation for Principal Component Analysis: An Enlarged Eigenspace Analysis
    Chen, Xi
    Lee, Jason D.
    Li, He
    Yang, Yun
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (540) : 1775 - 1786
  • [10] Communication Efficient Distributed Kernel Principal Component Analysis
    Balcan, Maria-Florina
    Liang, Yingyu
    Song, Le
    Woodruff, David
    Xie, Bo
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 725 - 734