Data Anonymization for Privacy Aware Machine Learning

被引：5

作者：

Jaidan, David Nizar ^{[1
]}

Carrere, Maxime ^{[2
]}

Chemli, Zakaria ^{[3
]}

Poisvert, Remi ^{[4
]}

机构：

[1] Innovat L B Scalian France, Labege, France

[2] Ctr Excellence Datascale Scalian France, Le Haillan, France

[3] Innovat L B Scalian France, Paris, France

[4] Innovat L B Scalian France, Rennes, France

来源：

MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE | 2019年 / 11943卷

关键词：

Privacy; Anonymization; Machine learning; Text encoding; Natural language processing; Time series; Anomaly detection;

D O I：

10.1007/978-3-030-37599-7_60

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.

引用

页码：725 / 737

页数：13

共 50 条

[31] Smart grid data anonymization for smart grid privacy
Telecommunications Software and Systems Group, Waterford Institute of Technology, Waterford, Ireland
不详
Commun. Comput. Info. Sci., (89-96):
[32] Data privacy through optimal k-anonymization
Bayardo, RJ
Agrawal, R
ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 217 - 228
[33] Data privacy in the Internet of Things based on anonymization: A review
Neves, Flavio
Souza, Rafael
Sousa, Juliana
Bonfim, Michel
Garcia, Vinicius
JOURNAL OF COMPUTER SECURITY, 2023, 31 (03) : 261 - 291
[34] Data Privacy in Wearable IoT Devices: Anonymization and Deanonymization
Park, Semi
Kim, Riha
Yoon, Hyunsik
Lee, Kyungho
SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
[35] Implementing Privacy Mechanisms for Data using Anonymization Algorithms
Bhukya, ShankarNayak
Pabboju, Suresh
Sharma, K. Venkatesh
2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2470 - 2476
[36] Personal data anonymization for security and privacy in collaborative environments
El Kalam, AA
Deswarte, Y
Trouessin, G
Cordonnier, E
2005 INTERNATIONAL SYMPOSIUM ON COLLABORATIVE TECHNOLOGIES AND SYSTEMS, PROCEEDINGS, 2005, : 56 - 61
[37] Balancing Privacy and Accuracy: Exploring the Impact of Data Anonymization on Deep Learning Models in Computer Vision
Lee, Jun Ha
You, Su Jeong
IEEE ACCESS, 2024, 12 : 8346 - 8358
[38] Machine learning concepts for correlated Big Data privacy
Biswas, Sreemoyee
Khare, Nilay
Agrawal, Pragati
Jain, Priyank
JOURNAL OF BIG DATA, 2021, 8 (01)
[39] Machine learning concepts for correlated Big Data privacy
Sreemoyee Biswas
Nilay Khare
Pragati Agrawal
Priyank Jain
Journal of Big Data, 8
[40] Privacy Risk Assessment of Training Data in Machine Learning
Bai, Yang
Fan, Mingyu
Li, Yu
Xie, Chuangmin
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 1010 - 1015

← 1 2 3 4 5 →