Privacy-preserving Data Classification and Similarity Evaluation for Distributed Systems

被引:14
|
作者
Jia, Qi [1 ]
Guo, Linke [1 ]
Jin, Zhanpeng [1 ]
Fang, Yuguang [2 ]
机构
[1] Binghamton Univ, Dept Elect & Comp Engn, Binghamton, NY 13902 USA
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
关键词
Privacy Preservation; Data Classification; Similarity Evaluation; Machine Learning;
D O I
10.1109/ICDCS.2016.94
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data classification is a widely used data mining technique for big data analysis. By training massive data collected from the real world, data classification helps learners discover hidden data patterns. In addition to data training, given a trained model from collected data, a user can classify whether a new incoming data belongs to an existing class; or, multiple distributed entities may collaborate to test the similarity of their trained results. However, due to data locality and privacy concerns, it is infeasible for large-scale distributed systems to share each individual's datasets with each other for data similarity check. On the one hand, the trained model is an entity's private asset and may leak private information, which should be well protected from all other non-collaborative entities. On the other hand, the new incoming data may contain sensitive information which cannot be disclosed directly for classification. To address the above privacy issues, we propose a privacy-preserving data classification and similarity evaluation scheme for distributed systems. With our scheme, neither new arriving data nor trained models are directly revealed during the classification and similarity evaluation procedures. The proposed scheme can be applied to many fields using data classification and evaluation. Based on extensive real-world experiments, we have also evaluated the privacy preservation, feasibility, and efficiency of the proposed scheme.
引用
收藏
页码:690 / 699
页数:10
相关论文
共 50 条
  • [1] Privacy-Preserving LDA Classification over Horizontally Distributed Data
    Khodaparast, Fatemeh
    Sheikhalishahi, Mina
    Haghighi, Hassan
    Martinelli, Fabio
    [J]. INTELLIGENT DISTRIBUTED COMPUTING XIII, 2020, 868 : 65 - 74
  • [2] An Efficient and Privacy-preserving Similarity Evaluation For Big Data Analytics
    Gheid, Zakaria
    Challal, Yacine
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 281 - 289
  • [3] Distributed Privacy-Preserving Minimal Distance Classification
    Krawczyk, Bartosz
    Wozniak, Michal
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2013, 8073 : 462 - 471
  • [4] Privacy-Preserving Classification of Data Streams
    Chao, Ching-Ming
    Chen, Po-Zung
    Sun, Chu-Hao
    [J]. JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2009, 12 (03): : 321 - 330
  • [5] Incentive Compatible Privacy-Preserving Distributed Classification
    Nix, Robert
    Kantarcioglu, Murat
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2012, 9 (04) : 451 - 462
  • [6] Lightweight privacy-Preserving data classification
    Ngoc Hong Tran
    Le-Khac, Nhien-An
    Kechadi, M-Tahar
    [J]. COMPUTERS & SECURITY, 2020, 97
  • [7] Privacy-preserving similarity coefficients for binary data
    Wong, Kok-Seng
    Kim, Myung Ho
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (09) : 1280 - 1290
  • [8] Privacy-preserving classification of Data streams
    Chao, Ching-Ming
    Chen, Po-Zung
    Sun, Chu-Hao
    [J]. Tamkang Journal of Science and Engineering, 2009, 12 (03): : 321 - 330
  • [9] Research on distributed privacy-preserving data mining
    Jia, Zhe
    Pang, Lei
    Luo, Shoushan
    Xin, Yang
    Zhang, Miao
    [J]. Journal of Convergence Information Technology, 2012, 7 (01) : 356 - 367
  • [10] Privacy-preserving ridge regression on distributed data
    Chen, Yi-Ruei
    Rezapour, Amir
    Tzeng, Wen-Guey
    [J]. INFORMATION SCIENCES, 2018, 451 : 34 - 49