Data Anonymization for Privacy Aware Machine Learning

被引:5
|
作者
Jaidan, David Nizar [1 ]
Carrere, Maxime [2 ]
Chemli, Zakaria [3 ]
Poisvert, Remi [4 ]
机构
[1] Innovat L B Scalian France, Labege, France
[2] Ctr Excellence Datascale Scalian France, Le Haillan, France
[3] Innovat L B Scalian France, Paris, France
[4] Innovat L B Scalian France, Rennes, France
关键词
Privacy; Anonymization; Machine learning; Text encoding; Natural language processing; Time series; Anomaly detection;
D O I
10.1007/978-3-030-37599-7_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.
引用
下载
收藏
页码:725 / 737
页数:13
相关论文
共 50 条
  • [21] Preserving data privacy in machine learning systems
    El Mestari, Soumia Zohra
    Lenzini, Gabriele
    Demirci, Huseyin
    COMPUTERS & SECURITY, 2024, 137
  • [22] Classification utility aware data stream anonymization
    Sopaoglu, Ugur
    Abul, Osman
    APPLIED SOFT COMPUTING, 2021, 110
  • [23] Privacy Aware Learning
    Duchi, John C.
    Jordan, Michael I.
    Wainwright, Martin J.
    JOURNAL OF THE ACM, 2014, 61 (06) : 1 - 57
  • [24] Privacy Preserving Data Publishing and Data Anonymization Approaches: A Review
    Goswami, Puneet
    Madan, Suman
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 139 - 142
  • [25] Data-Blind ML: Building privacy-aware machine learning models without direct data access
    Pastorino, Javier
    Biswas, Ashis Kumer
    2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 95 - 98
  • [26] Data Anonymization Using Pseudonym System to Preserve Data Privacy
    Abd Razak, Shukor
    Nazari, Nur Hafizah Mohd
    Al-Dhaqm, Arafat
    IEEE ACCESS, 2020, 8 (08): : 43256 - 43264
  • [27] FedQAS: Privacy-Aware Machine Reading Comprehension with Federated Learning
    Ait-Mlouk, Addi
    Alawadi, Sadi A.
    Toor, Salman
    Hellander, Andreas
    APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [28] A decision-support framework for data anonymization with application to machine learning processes
    Caruccio, Loredana
    Desiato, Domenico
    Polese, Giuseppe
    Tortora, Genoveffa
    Zannone, Nicola
    INFORMATION SCIENCES, 2022, 613 : 1 - 32
  • [29] Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud
    Zhang, Xuyun
    Dou, Wanchun
    Pei, Jian
    Nepal, Surya
    Yang, Chi
    Liu, Chang
    Chen, Jinjun
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (08) : 2293 - 2307
  • [30] PRIVACY PRESERVATION IN BIG DATA USING ANONYMIZATION TECHNIQUES
    Karle, Tanashri
    Vora, Deepali
    2017 1ST IEEE INTERNATIONAL CONFERENCE ON DATA MANAGEMENT, ANALYTICS AND INNOVATION (ICDMAI), 2017, : 340 - 343