Automatic Detection of Sensitive Data Using Transformer-Based Classifiers

被引:2
|
作者
Petrolini, Michael [1 ]
Cagnoni, Stefano [1 ]
Mordonini, Monica [1 ]
机构
[1] Univ Parma, Dept Engn & Architecture, Parco Area Sci 181a, I-43124 Parma, Italy
关键词
GDPR; sensitive data; personal data; natural language processing; BERT; transformers;
D O I
10.3390/fi14080228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Vison Transformer-Based Automatic Crack Detection on Dam Surface
    Zhou, Jian
    Zhao, Guochuan
    Li, Yonglong
    WATER, 2024, 16 (10)
  • [2] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [3] Automatic Fake News Detection in Political Platforms - A Transformer-based Approach
    Raza, Shaina
    CASE 2021: THE 4TH WORKSHOP ON CHALLENGES AND APPLICATIONS OF AUTOMATED EXTRACTION OF SOCIO-POLITICAL EVENTS FROM TEXT (CASE), 2021, : 68 - 78
  • [4] Transformer-Based Flood Detection Using Multiclass Segmentation
    Park, Joo-Chan
    Kim, Dong-Geon
    Yang, Ji-Ro
    Kang, Kyo-Seok
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 291 - 292
  • [5] Aggression Detection in Twitter Data Using Transformer-Based Convolutional Neural Network Model
    Ozbay, Erdal
    KONYA JOURNAL OF ENGINEERING SCIENCES, 2022, 10 (04): : 986 - 1001
  • [6] Transformer-Based Method for Unsupervised Anomaly Detection of Flight Data
    Yu, Hao
    Wu, Honglan
    Sun, Youchao
    Liu, Hao
    2023 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY, VOL I, APISAT 2023, 2024, 1050 : 1816 - 1826
  • [7] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [8] A Full Transformer-based Framework for Automatic Pain Estimation using Videos
    Gkikas, Stefanos
    Tsiknakis, Manolis
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [9] A novel transformer-based network with attention mechanism for automatic pavement crack detection
    Guo, Feng
    Liu, Jian
    Lv, Chengshun
    Yu, Huayang
    CONSTRUCTION AND BUILDING MATERIALS, 2023, 391
  • [10] Rail surface defect detection using a transformer-based network
    Guo, Feng
    Liu, Jian
    Qian, Yu
    Xie, Quanyi
    JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION, 2024, 38