Data Anonymization for Privacy Aware Machine Learning

被引:5
|
作者
Jaidan, David Nizar [1 ]
Carrere, Maxime [2 ]
Chemli, Zakaria [3 ]
Poisvert, Remi [4 ]
机构
[1] Innovat L B Scalian France, Labege, France
[2] Ctr Excellence Datascale Scalian France, Le Haillan, France
[3] Innovat L B Scalian France, Paris, France
[4] Innovat L B Scalian France, Rennes, France
关键词
Privacy; Anonymization; Machine learning; Text encoding; Natural language processing; Time series; Anomaly detection;
D O I
10.1007/978-3-030-37599-7_60
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.
引用
收藏
页码:725 / 737
页数:13
相关论文
共 50 条
  • [31] Smart grid data anonymization for smart grid privacy
    Telecommunications Software and Systems Group, Waterford Institute of Technology, Waterford, Ireland
    不详
    Commun. Comput. Info. Sci., (89-96):
  • [32] Data privacy through optimal k-anonymization
    Bayardo, RJ
    Agrawal, R
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 217 - 228
  • [33] Data privacy in the Internet of Things based on anonymization: A review
    Neves, Flavio
    Souza, Rafael
    Sousa, Juliana
    Bonfim, Michel
    Garcia, Vinicius
    JOURNAL OF COMPUTER SECURITY, 2023, 31 (03) : 261 - 291
  • [34] Data Privacy in Wearable IoT Devices: Anonymization and Deanonymization
    Park, Semi
    Kim, Riha
    Yoon, Hyunsik
    Lee, Kyungho
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [35] Implementing Privacy Mechanisms for Data using Anonymization Algorithms
    Bhukya, ShankarNayak
    Pabboju, Suresh
    Sharma, K. Venkatesh
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2470 - 2476
  • [36] Personal data anonymization for security and privacy in collaborative environments
    El Kalam, AA
    Deswarte, Y
    Trouessin, G
    Cordonnier, E
    2005 INTERNATIONAL SYMPOSIUM ON COLLABORATIVE TECHNOLOGIES AND SYSTEMS, PROCEEDINGS, 2005, : 56 - 61
  • [37] Balancing Privacy and Accuracy: Exploring the Impact of Data Anonymization on Deep Learning Models in Computer Vision
    Lee, Jun Ha
    You, Su Jeong
    IEEE ACCESS, 2024, 12 : 8346 - 8358
  • [38] Machine learning concepts for correlated Big Data privacy
    Biswas, Sreemoyee
    Khare, Nilay
    Agrawal, Pragati
    Jain, Priyank
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [39] Machine learning concepts for correlated Big Data privacy
    Sreemoyee Biswas
    Nilay Khare
    Pragati Agrawal
    Priyank Jain
    Journal of Big Data, 8
  • [40] Privacy Risk Assessment of Training Data in Machine Learning
    Bai, Yang
    Fan, Mingyu
    Li, Yu
    Xie, Chuangmin
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 1010 - 1015