Automatic Detection of Sensitive Data Using Transformer-Based Classifiers

被引:2
|
作者
Petrolini, Michael [1 ]
Cagnoni, Stefano [1 ]
Mordonini, Monica [1 ]
机构
[1] Univ Parma, Dept Engn & Architecture, Parco Area Sci 181a, I-43124 Parma, Italy
关键词
GDPR; sensitive data; personal data; natural language processing; BERT; transformers;
D O I
10.3390/fi14080228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The General Data Protection Regulation (GDPR) has allowed EU citizens and residents to have more control over their personal data, simplifying the regulatory environment affecting international business and unifying and homogenising privacy legislation within the EU. This regulation affects all companies that process data of European residents regardless of the place in which they are processed and their registered office, providing for a strict discipline of data protection. These companies must comply with the GDPR and be aware of the content of the data they manage; this is especially important if they are holding sensitive data, that is, any information regarding racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, data relating to the sexual life or sexual orientation of the person, as well as data on physical and mental health. These classes of data are hardly structured, and most frequently they appear within a document such as an email message, a review or a post. It is extremely difficult to know if a company is in possession of sensitive data at the risk of not protecting them properly. The goal of the study described in this paper is to use Machine Learning, in particular the Transformer deep-learning model, to develop classifiers capable of detecting documents that are likely to include sensitive data. Additionally, we want the classifiers to recognize the particular type of sensitive topic with which they deal, in order for a company to have a better knowledge of the data they own. We expect to make the model described in this paper available as a web service, customized to private data of possible customers, or even in a free-to-use version based on the freely available data set we have built to train the classifiers.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Automatic Road Extraction from Historical Maps Using Transformer-Based SegFormers
    Sertel, Elif
    Hucko, Can Michael
    Kabadayi, Mustafa Erdem
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (12)
  • [22] TMIF: transformer-based multi-modal interactive fusion for automatic rumor detection
    Jiandong Lv
    Xingang Wang
    Cuiling Shao
    Multimedia Systems, 2023, 29 : 2979 - 2989
  • [23] Automatic summarization of cooking videos using transfer learning and transformer-based models
    P. M. Alen Sadique
    R. V. Aswiga
    Discover Artificial Intelligence, 5 (1):
  • [24] Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers
    Bujel, Kamil
    Yannakoudakis, Helen
    Rei, Marek
    REPL4NLP 2021: PROCEEDINGS OF THE 6TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2021, : 195 - 205
  • [25] Transformer-Based Fault Detection Using Pressure Signals for Hydraulic Pumps
    Ran Kim, A.
    Seon Kim, Ha
    Young Kim, Sun
    IEEE ACCESS, 2024, 12 : 145795 - 145808
  • [26] Transformer-Based Parking Slot Detection Using Fixed Anchor Points
    Bui, Quang Huy
    Suhr, Jae Kyu
    IEEE ACCESS, 2023, 11 : 104417 - 104427
  • [27] Fake review detection using transformer-based enhanced LSTM and RoBERTa
    Mohawesh R.
    Bany Salameh H.
    Jararweh Y.
    Alkhalaileh M.
    Maqsood S.
    International Journal of Cognitive Computing in Engineering, 2024, 5 : 250 - 258
  • [28] On the Use of Transformer-Based Models for Intent Detection Using Clustering Algorithms
    Moura, Andre
    Lima, Pedro
    Mendonca, Fabio
    Mostafa, Sheikh Shanawaz
    Morgado-Dias, Fernando
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [29] A transformer-based cloud detection approach using Sentinel 2 imageries
    Singh, Rohit
    Biswas, Mantosh
    Pal, Mahesh
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (10) : 3194 - 3208
  • [30] Efficient crop row detection using transformer-based parameter prediction
    Guo, Zhiming
    Quan, Longzhe
    Sun, Deng
    Lou, Zhaoxia
    Geng, Yuhang
    Chen, Tianbao
    Xue, Yi
    He, Jinbing
    Hou, Pengbiao
    Wang, Chuan
    Wang, Jiakang
    BIOSYSTEMS ENGINEERING, 2024, 246 : 13 - 25