Data Anonymization for Privacy Aware Machine Learning

被引:5
|
作者
Jaidan, David Nizar [1 ]
Carrere, Maxime [2 ]
Chemli, Zakaria [3 ]
Poisvert, Remi [4 ]
机构
[1] Innovat L B Scalian France, Labege, France
[2] Ctr Excellence Datascale Scalian France, Le Haillan, France
[3] Innovat L B Scalian France, Paris, France
[4] Innovat L B Scalian France, Rennes, France
关键词
Privacy; Anonymization; Machine learning; Text encoding; Natural language processing; Time series; Anomaly detection;
D O I
10.1007/978-3-030-37599-7_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.
引用
收藏
页码:725 / 737
页数:13
相关论文
共 50 条
  • [1] On the Role of Data Anonymization in Machine Learning Privacy
    Senavirathne, Navoda
    Torra, Vicenc
    [J]. 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 664 - 675
  • [2] Machine Learning Privacy Aware Anonymization Using MapReduce Based Neural Network
    Selvi, U.
    Pushpa, S.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 31 (02): : 1185 - 1196
  • [3] A new utility-aware anonymization model for privacy preserving data publishing
    Canbay, Yavuz
    Sagiroglu, Seref
    Vural, Yilmaz
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
  • [4] Big Data Privacy and Anonymization
    Torra, Vicenc
    Navarro-Arribas, Guillermo
    [J]. PRIVACY AND IDENTITY MANAGEMENT: FACING UP TO NEXT STEPS, 2016, 498 : 15 - 26
  • [5] A new utility-aware anonymization model for privacy preserving data publishing
    Canbay, Yavuz
    Sagiroglu, Seref
    Vural, Yilmaz
    [J]. Concurrency and Computation: Practice and Experience, 2022, 34 (10)
  • [6] Privacy Aware Machine Learning and the "Right to be Forgotten"
    Malle, Bernd
    Kieseberg, Peter
    Schrittwieser, Sebastian
    Holzinger, Andreas
    [J]. ERCIM NEWS, 2016, (107): : 22 - 23
  • [7] Data privacy-aware machine learning approach in pancreatic cancer diagnosis
    Akmese, Omer Faruk
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [8] Evaluating the Impact of Data Anonymization in a Machine Learning Application
    Campanile, Lelio
    Forgione, Fabio
    Mastroianni, Michele
    Palmiero, Gianfranco
    Sanghez, Carlo
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART IV, 2022, 13380 : 389 - 400
  • [9] Privacy-Aware Data Forensics of VRUs Using Machine Learning and Big Data Analytics
    Babar M.
    Tariq M.U.
    Almasoud A.S.
    Alshehri M.D.
    [J]. Babar, Muhammad (muhammad.babar@aiou.edu.pk), 1600, Hindawi Limited (2021):
  • [10] An Anonymization Service for Privacy in Data Mining
    Silveira, Matheus M.
    Silva, Danielle S.
    Souza, Michael S.
    Silva, Douglas A.
    Neto, Jonas N.
    Mesquita, Maria C.
    Gomes, Rafael L.
    [J]. PROCEEDINGS OF12TH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE AND SECURE COMPUTING, LADC 2023, 2023, : 214 - 219