Privacy protection of textual attributes through a semantic-based masking method

被引:22
|
作者
Martinez, Sergio [1 ]
Sanchez, David [1 ]
Valls, Aida [1 ]
Batet, Montserrat [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Sci & Math, Intelligent Technol Adv Knowledge Acquisit ITAKA, Tarragona 43007, Catalonia, Spain
关键词
Privacy protection; Anonymity; Ontologies; Semantic similarity; Fusion of textual data;
D O I
10.1016/j.inffus.2011.03.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:304 / 314
页数:11
相关论文
共 50 条
  • [41] A Semantic Inference Based Method for Privacy Measurement
    Chen, Baocun
    Zhu, Nafei
    He, Jingsha
    He, Peng
    Jin, Shuting
    Pan, Shijia
    [J]. IEEE ACCESS, 2020, 8 : 200112 - 200128
  • [42] Block-based masking region relocation and detection method for image privacy masking
    Park, Sohee
    Kim, Geonwoo
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1586 - 1588
  • [43] Semantic-based simulation files cross-platform accessing method
    Department of Mechanical Engineering, Tsinghua University, Beijing
    100084, China
    [J]. Jisuanji Jicheng Zhizao Xitong, 7 (1771-1780):
  • [44] A Semantic-based Method of Internet Public Opinion Analysis for Short Text
    Hou, Shengluan
    Liu, Lei
    Cao, Cungen
    Yan, Shuying
    [J]. INTERNATIONAL SYMPOSIUM ON FUZZY SYSTEMS, KNOWLEDGE DISCOVERY AND NATURAL COMPUTATION (FSKDNC 2014), 2014, : 335 - 339
  • [45] Semantic scan context: a novel semantic-based loop-closure method for LiDAR SLAM
    Lin Li
    Xin Kong
    Xiangrui Zhao
    Tianxin Huang
    Yong Liu
    [J]. Autonomous Robots, 2022, 46 : 535 - 551
  • [46] Semantic scan context: a novel semantic-based loop-closure method for LiDAR SLAM
    Li, Lin
    Kong, Xin
    Zhao, Xiangrui
    Huang, Tianxin
    Liu, Yong
    [J]. AUTONOMOUS ROBOTS, 2022, 46 (04) : 535 - 551
  • [47] Enabling the Smart Home Through a Semantic-Based Context-Aware System
    Mahroo, Atieh
    Spoladore, Daniele
    Caldarola, Enrico G.
    Modoni, Gianfranco E.
    Sacco, Marco
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2018,
  • [48] Semantic-based subassembly identification considering non-geometric structure attributes and assembly process factors
    Xiaolin Shi
    Xitian Tian
    Gangfeng Wang
    Dongping Zhao
    Min Zhang
    [J]. The International Journal of Advanced Manufacturing Technology, 2020, 110 : 439 - 455
  • [49] Semantic-based subassembly identification considering non-geometric structure attributes and assembly process factors
    Shi, Xiaolin
    Tian, Xitian
    Wang, Gangfeng
    Zhao, Dongping
    Zhang, Min
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2020, 110 (1-2): : 439 - 455
  • [50] Analyse digital forensic evidences through a semantic-based methodology and NLP techniques
    Amato, F.
    Cozzolino, G.
    Moscato, V.
    Moscato, F.
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 98 : 297 - 307