Automatic Detection of Sensitive Data Using Transformer-Based Classifiers

被引:2
|
作者
Petrolini, Michael [1 ]
Cagnoni, Stefano [1 ]
Mordonini, Monica [1 ]
机构
[1] Univ Parma, Dept Engn & Architecture, Parco Area Sci 181a, I-43124 Parma, Italy
关键词
GDPR; sensitive data; personal data; natural language processing; BERT; transformers;
D O I
10.3390/fi14080228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Transformer-Based Intrusion Detection for IoT Networks
    Akuthota, Uday Chandra
    Bhargava, Lava
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (05): : 6062 - 6067
  • [42] A transformer-based approach to irony and sarcasm detection
    Rolandos Alexandros Potamias
    Georgios Siolas
    Andreas - Georgios Stafylopatis
    Neural Computing and Applications, 2020, 32 : 17309 - 17320
  • [43] Vision Transformer-Based Tailing Detection in Videos
    Lee, Jaewoo
    Lee, Sungjun
    Cho, Wonki
    Siddiqui, Zahid Ali
    Park, Unsang
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [44] Transformer-based vehicle detection for surveillance images
    Jin, Zhi
    Zhang, Qian
    Gou, Chao
    Lu, Qiang
    Li, Xiying
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
  • [45] A transformer-based IDE plugin for vulnerability detection
    Mamede, Claudia
    Pinconschi, Eduard
    Abreu, Rui
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [46] Personality BERT: A Transformer-Based Model for Personality Detection from Textual Data
    Jain, Dipika
    Kumar, Akshi
    Beniwal, Rohit
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION NETWORKS (ICCCN 2021), 2022, 394 : 515 - 522
  • [47] Enhancing Address Data Integrity using Transformer-Based Language Models
    Kurklu, Omer Faruk
    Akagiunduz, Erdem
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [48] Ship trajectory prediction using AIS data with TransFormer-based AI
    Takahashi, Koya
    Zama, Kaito
    Hiroi, Noriko F.
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1302 - 1305
  • [49] Adaptation of Transformer-Based Models for Depression Detection
    Adebanji, Olaronke O.
    Ojo, Olumide E.
    Calvo, Hiram
    Gelbukh, Irina
    Sidorov, Grigori
    COMPUTACION Y SISTEMAS, 2024, 28 (01): : 151 - 165
  • [50] A transformer-based approach to irony and sarcasm detection
    Potamias, Rolandos Alexandros
    Siolas, Georgios
    Stafylopatis, Andreas-Georgios
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23): : 17309 - 17320