Privacy-preserving clustering of unstructured big data for cloud-based enterprise search solutions

被引:0
|
作者
Zobaed, Sm [1 ]
Salehi, Mohsen Amini [1 ]
机构
[1] Univ Louisiana Lafayette, Sch Comp & Informat, High Performance Cloud Comp HPCC Lab, Lafayette, LA 70504 USA
来源
关键词
cloud trustworthiness; dynamic datasets; encrypted clustering; unstructured big data; INFORMATION;
D O I
10.1002/cpe.7160
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses dealing with confidential big data (e.g., criminal reports) are reluctant to fully embrace such cloud services due to valid data privacy concerns. Solutions based on client-side encryption have been developed to mitigate these concerns. Nonetheless, such solutions hinder data processing, especially, data clustering, which is pivotal in applications such as real-time search on large corpora (e.g., big datasets). To cluster encrypted big data, we propose privacy-preserving clustering schemes, called ClusPr, for three forms of unstructured datasets, namely static, semi-dynamic, and dynamic. ClusPr functions based on statistical characteristics of the datasets to: (A) determine the suitable number of clusters; (B) populate the clusters with topically relevant tokens; and (C) adapt the cluster set based on the dynamism of the underlying dataset. Experimental results, obtained from evaluating ClusPr against other schemes in the literature, on three different test datasets demonstrate between 30%$$ 30\% $$ and 60%$$ 60\% $$ improvement on the cluster coherency. Moreover, we notice that employing ClusPr within a privacy-preserving enterprise search system can reduce the search time by up to 78%$$ 78\% $$, while improving the search accuracy by up to 35%$$ 35\% $$.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Privacy-Preserving Cloud-Based Firewall for IaaS-Based Enterprise
    Sheng, Hualong
    Wei, Lingbo
    Zhang, Chi
    Zhang, Xia
    [J]. Proceedings 2016 International Conference on Networking and Network Applications NaNA 2016, 2016, : 206 - 209
  • [2] A Cloud-based Secure and Privacy-Preserving Clustering Analysis of Infectious Disease
    Liu, Jianqing
    Hu, Yaodan
    Yue, Hao
    Gong, Yanmin
    Fang, Yuguang
    [J]. 2018 IEEE SYMPOSIUM ON PRIVACY-AWARE COMPUTING (PAC), 2018, : 107 - 116
  • [3] CREDENTIAL: A Framework for Privacy-Preserving Cloud-Based Data Sharing
    Hoerandner, Felix
    Krenn, Stephan
    Migliavacca, Andrea
    Thiemer, Florian
    Zwattendorfer, Bernd
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, (ARES 2016), 2016, : 742 - 749
  • [4] Privacy-Preserving Access to Big Data in the Cloud
    Li, Peng
    Guo, Song
    Miyazaki, Toshiaki
    Xie, Miao
    Hu, Jiankun
    Zhuang, Weihua
    [J]. IEEE CLOUD COMPUTING, 2016, 3 (05): : 34 - 42
  • [5] PRIVACY-PRESERVING CLOUD-BASED DNN INFERENCE
    Xie, Shangyu
    Liu, Bingyu
    Hong, Yuan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2675 - 2679
  • [6] PPDF: A Privacy-Preserving Cloud-Based Data Distribution System With Filtering
    Zhang, Yudi
    Susilo, Willy
    Guo, Fuchun
    Yang, Guomin
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 3920 - 3930
  • [7] Privacy-Preserving Cloud-Based Statistical Analyses on Sensitive Categorical Data
    Ricci, Sara
    Domingo-Ferrer, Josep
    Sanchez, David
    [J]. MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, (MDAI 2016), 2016, 9880 : 227 - 238
  • [8] SAED: Edge-Based Intelligence for Privacy-Preserving Enterprise Search on the Cloud
    Zobaed, Sakib M.
    Salehi, Mohsen Amini
    Buyya, Rajkumar
    [J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 366 - 375
  • [9] Privacy-Preserving Deep Learning on Big Data in Cloud
    Fan, Yongkai
    Zhang, Wanyu
    Bai, Jianrong
    Lei, Xia
    Li, Kuanching
    [J]. CHINA COMMUNICATIONS, 2021, 20 (11) : 176 - 186
  • [10] Privacy-Preserving Deep Learning on Big Data in Cloud
    Yongkai Fan
    Wanyu Zhang
    Jianrong Bai
    Xia Lei
    Kuanching Li
    [J]. China Communications, 2023, 20 (11) : 176 - 186