Privacy protection of textual attributes through a semantic-based masking method

被引:22
|
作者
Martinez, Sergio [1 ]
Sanchez, David [1 ]
Valls, Aida [1 ]
Batet, Montserrat [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Sci & Math, Intelligent Technol Adv Knowledge Acquisit ITAKA, Tarragona 43007, Catalonia, Spain
关键词
Privacy protection; Anonymity; Ontologies; Semantic similarity; Fusion of textual data;
D O I
10.1016/j.inffus.2011.03.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using microdata provided by statistical agencies has many benefits from the data mining point of view. However, such data often involve sensitive information that can be directly or indirectly related to individuals. An appropriate anonymisation process is needed to minimise the risk of disclosure. Several masking methods have been developed to deal with continuous-scale numerical data or bounded textual values but approaches to tackling the anonymisation of textual values are scarce and shallow. Because of the importance of textual data in the Information Society, in this paper we present a new masking method for anonymising unbounded textual values based on the fusion of records with similar values to form groups of indistinguishable individuals. Since, from the data exploitation point of view, the utility of textual information is closely related to the preservation of its meaning, our method relies on the structured knowledge representation given by ontologies. This domain knowledge is used to guide the masking process towards the merging that best preserves the semantics of the original data. Because textual data typically consist of large and heterogeneous value sets, our method provides a computationally efficient algorithm by relying on several heuristics rather than exhaustive searches. The method is evaluated with real data in a concrete data mining application that involves solving a clustering problem. We also compare the method with more classical approaches that focus on optimising the value distribution of the dataset. Results show that a semantically grounded anonymisation best preserves the utility of data in both the theoretical and the practical setting, and reduces the probability of record linkage. At the same time, it achieves good scalability with regard to the size of input data. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:304 / 314
页数:11
相关论文
共 50 条
  • [21] A Semantic-based Method for Unsupervised Commonsense Question Answering
    Niu, Yilin
    Huang, Fei
    Liang, Jiaming
    Chen, Wenkai
    Zhu, Xiaoyan
    Huang, Minlie
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3037 - 3049
  • [22] A Semantic-Based Development Method for Consumer Support Systems
    Yu, Jenperng
    Lin, Jyhjong
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2012, 13 (02): : 205 - 221
  • [23] A New Semantic-Based Tool Detection Method for Robots
    Chen, W. B.
    He, C.
    Chen, W. Z.
    Chen, Q. L.
    Wu, P. L.
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2021, 16 (02) : 1 - 11
  • [24] Semantic Location Privacy Protection Based on Privacy Preference for Road Network
    Wang, Yonglu
    Zuo, Kaizhong
    Liu, Rui
    Guo, Liangmin
    [J]. CYBERSPACE SAFETY AND SECURITY, PT II, 2019, 11983 : 330 - 342
  • [25] Human Memory Assistance through Semantic-Based Text Processing
    Trundle, P. R.
    Jiang, J.
    [J]. DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS, 2009, 5518 : 780 - 787
  • [26] Supporting collaboration through semantic-based workflow and constraint solving
    Chen-Burger, YH
    Hui, KY
    Preece, AD
    Gray, PMD
    Tate, A
    [J]. ENGINEERING KNOWLEDGE IN THE AGE OF THE SEMANTIC WEB, PROCEEDINGS, 2004, 3257 : 487 - 488
  • [27] Trajectory Privacy Protection Based on Location Semantic Perception
    Hu, Zhao-Wei
    Yang, Jing
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2019, 28 (03)
  • [28] SFL: A semantic-based federated learning method for POI recommendation
    Dong, Xunan
    Zeng, Jun
    Wen, Junhao
    Gao, Min
    Zhou, Wei
    [J]. INFORMATION SCIENCES, 2024, 679
  • [29] A new semantic-based feature selection method for spam filtering
    Mendez, Jose R.
    Cotos-Yanez, Tomas R.
    Ruano-Ordas, David
    [J]. APPLIED SOFT COMPUTING, 2019, 76 : 89 - 104
  • [30] Semantic-Based Recommendation Method for Sport News Aggregation System
    Quang-Minh Nguyen
    Thanh-Tam Nguyen
    Tuan-Dung Cao
    [J]. RESEARCH AND PRACTICAL ISSUES OF ENTERPRISE INFORMATION SYSTEMS, 10TH IFIP WG 8.9 WORKING CONFERENCE, CONFENIS 2016, 2016, 268 : 32 - 47