Anonymisation Models for Text Data: State of the Art, Challenges and Future Directions

被引:0
|
作者
Lison, Pierre [1 ]
Pilan, Ildiko [1 ]
Sanchez, David [2 ]
Batet, Montserrat [2 ]
Ovrelid, Lilja [3 ]
机构
[1] Norwegian Comp Ctr, Oslo, Norway
[2] Univ Rovira & Virgili, CYBERCAT, UNESCO Chair Data Privacy, Tarragona, Spain
[3] Univ Oslo, Language Technol Grp, Oslo, Norway
关键词
DE-IDENTIFICATION; PRIVACY PROTECTION; INFORMATION; SURROGATES; REDACTION; RELEASE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This position paper investigates the problem of automated text anonymisation, which is a pre-requisite for secure sharing of documents containing sensitive information about individuals. We summarise the key concepts behind text anonymisation and provide a review of current approaches. Anonymisation methods have so far been developed in two fields with little mutual interaction, namely natural language processing and privacy-preserving data publishing. Based on a case study, we outline the benefits and limitations of these approaches and discuss a number of open challenges, such as (1) how to account for multiple types of semantic inferences, (2) how to strike a balance between disclosure risk and data utility and (3) how to evaluate the quality of the resulting anonymisation. We lay out a case for moving beyond sequence labelling models and incorporate explicit measures of disclosure risk into the text anonymisation process.
引用
收藏
页码:4188 / 4203
页数:16
相关论文
共 50 条
  • [41] PATHOLOGICAL SPEECH PROCESSING: STATE-OF-THE-ART, CURRENT CHALLENGES, AND FUTURE DIRECTIONS
    Gupta, Rahul
    Chaspari, Theodora
    Kim, Jangwon
    Kumar, Naveen
    Bone, Daniel
    Narayanan, Shrikanth
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6470 - 6474
  • [42] On the Current State of Linked Open Data: Issues, Challenges, and Future Directions
    Fayyaz, Nosheen
    Ullah, Irfan
    Khusro, Shah
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2018, 14 (04) : 110 - 128
  • [43] THANATOMICROBIOME - STATE OF THE ART AND FUTURE DIRECTIONS
    Wojcik, Joanna
    Tomsia, Marcin
    Drzewiecki, Artur
    Skowronek, Rafal
    ADVANCEMENTS OF MICROBIOLOGY, 2021, 60 (01) : 21 - 29
  • [44] Topological Data Analysis in smart manufacturing: State of the art and future directions
    Uray, Martin
    Giunti, Barbara
    Kerber, Michael
    Huber, Stefan
    JOURNAL OF MANUFACTURING SYSTEMS, 2024, 76 : 75 - 91
  • [45] Cognitive Psychology Meets Data Management: State of the Art and Future Directions
    Bhowmick, Sourav S.
    Chen, S. H. Annabel
    Srivastava, Divesh
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 590 - 596
  • [46] Big data in education: a state of the art, limitations, and future research directions
    Baig, Maria Ijaz
    Shuib, Liyana
    Yadegaridehkordi, Elaheh
    INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION, 2020, 17 (01)
  • [47] Big data in education: a state of the art, limitations, and future research directions
    Maria Ijaz Baig
    Liyana Shuib
    Elaheh Yadegaridehkordi
    International Journal of Educational Technology in Higher Education, 17
  • [48] Sentiment Analysis of Noisy Malay Text: State of Art, Challenges and Future Work
    Abu Bakar, Muhammad Fakhrur Razi
    Idris, Norisma
    Shuib, Liyana
    Khamis, Norazlina
    IEEE ACCESS, 2020, 8 : 24687 - 24696
  • [50] Decision analytic models for Alzheimer's disease: State of the art and future directions
    Cohen, Joshua T.
    Neumann, Peter J.
    ALZHEIMERS & DEMENTIA, 2008, 4 (03) : 212 - 222