Privacy protection of textual attributes through a semantic-based masking method

被引:22
|
作者
Martinez, Sergio [1 ]
Sanchez, David [1 ]
Valls, Aida [1 ]
Batet, Montserrat [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Sci & Math, Intelligent Technol Adv Knowledge Acquisit ITAKA, Tarragona 43007, Catalonia, Spain
关键词
Privacy protection; Anonymity; Ontologies; Semantic similarity; Fusion of textual data;
D O I
10.1016/j.inffus.2011.03.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:304 / 314
页数:11
相关论文
共 50 条
  • [1] Semantic-Based Customizable Location Privacy Protection Scheme
    Lv, Xin
    Shi, Haitao
    Wang, Aili
    Zeng, Tao
    Wu, Zhongzhong
    [J]. 2018 17TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES), 2018, : 148 - 154
  • [2] Semantic-Based Customizable Location Privacy Protection Scheme
    Lv, Xin
    Shi, Haitao
    Wang, Aili
    Zeng, Tao
    Wu, Zhongzhong
    [J]. Proceedings - 2018 17th International Symposium on Distributed Computing and Applications for Business Engineering and Science, DCABES 2018, 2018, : 148 - 154
  • [3] A semantic-based user privacy protection framework for Web services
    Tumer, A
    Dogac, A
    Toroslu, IH
    [J]. INTELLIGENT TECHNIQUES FOR WEB PERSONALIZATION, 2005, 3169 : 289 - 305
  • [4] Semantic-based Privacy Protection of Electronic Health Records for Collaborative Research
    Lu, Yang
    Sinnott, Richard O.
    [J]. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 519 - 526
  • [5] A Semantic-based Inference Control Algorithm for RDF Stores Privacy Protection
    Qi, Yuying
    Zhu, Tao
    Ning, Huansheng
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SAFETY FOR ROBOTICS (ISR), 2018, : 178 - 183
  • [6] A semantic-based inference control algorithm for OWL repository privacy protection
    Qi, Yuying
    Yao, Xuanxia
    Zhu, Tao
    Ning, Huansheng
    [J]. COMPUTER NETWORKS, 2019, 156 : 1 - 8
  • [7] Textual Knowledge Representation through the Semantic-based Graph Structure in Clustering Applications
    Wu, Jiangning
    Dang, Yanzhong
    Pan, Donghua
    Xuan, Zhaoguo
    Liu, Qiaofeng
    [J]. 43RD HAWAII INTERNATIONAL CONFERENCE ON SYSTEMS SCIENCES VOLS 1-5 (HICSS 2010), 2010, : 3398 - 3405
  • [8] Semantic-based privacy settings negotiation and management
    Sanchez, Odnan Ref
    Torre, Ilaria
    Knijnenburg, Bart P.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 879 - 898
  • [9] A Semantic-based Approach to Reduce the Reading Time of Privacy Policies
    Kaur, Jasmin
    Dara, Rozita
    Chaturvedi, Ritu
    [J]. 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST), 2022,
  • [10] A Semantic-Based Approach for Privacy-Preserving in Trajectory Publishing
    Ye, Ayong
    Zhang, Qiang
    Diao, Yiqing
    Zhang, Jiaomei
    Deng, Huina
    Cheng, Baorong
    [J]. IEEE ACCESS, 2020, 8 : 184965 - 184975