A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data

被引:6
|
作者
Manchini, Carlos [1 ]
Ospina, Raydonal [1 ,2 ]
Leiva, Victor [3 ]
Martin-Barreiro, Carlos [4 ,5 ]
机构
[1] Univ Fed Pernambuco, Dept Stat, CASTLab, Recife, Brazil
[2] Univ Fed Bahia, Dept Estat, IME, Salvador, Brazil
[3] Pontifica Univ Catolica Valparaiso, Sch Ind Engn, Valparaiso, Chile
[4] Escuela Super Politecn Litoral ESPOL, Fac Nat Sci & Math, Guayaquil, Ecuador
[5] Univ Espiritu Santo, Fac Engn, Samborondon, Ecuador
关键词
Anonymity; Confidentiality; Data breach and fitting; Linear and logistic regressions; Monte Carlo simulation; Perturbations of data; Statistical consistency and modeling; HETEROSKEDASTICITY; ESTIMATOR; INFERENCE;
D O I
10.1016/j.ins.2022.10.076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generation of massive data in the digital age leads to possible violations of individual pri-vacy. The search for personal data becomes an increasingly recurrent exposure today. The present work corresponds to the area of differential privacy, which guarantees data confi-dentiality and robustness against invasive identification attacks. This area stands out in the literature for its rigorous mathematical basis capable of quantifying the loss of privacy. A differentially private method based on regression models was developed to prevent inver-sion attacks while retaining model efficacy. In this paper, we propose a novel approach to improve the data privacy based on regression models under heteroscedasticity, a common aspect, but not studied, in practical situations of differential privacy. The influence of pri-vacy restriction on the statistical performance of the estimators of model parameters is evaluated using Monte Carlo simulations, including a study of performance associated with test rejection rates for the proposed approach. The results of the numerical evaluation show high inferential distortion for stricter privacy restrictions. Empirical illustrations with real-world data are presented to show potential applications.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:280 / 300
页数:21
相关论文
共 50 条
  • [31] Research on Federated Learning Data Sharing Scheme Based on Differential Privacy
    Guo, Lihong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5069 - 5085
  • [32] A New Approach of Fatigue Classification Based on Data of Tongue and Pulse With Machine Learning
    Shi, Yulin
    Yao, Xinghua
    Xu, Jiatuo
    Hu, Xiaojuan
    Tu, Liping
    Lan, Fang
    Cui, Ji
    Cui, Longtao
    Huang, Jingbin
    Li, Jun
    Bi, Zijuan
    Li, Jiacai
    FRONTIERS IN PHYSIOLOGY, 2022, 12
  • [33] Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data
    Chen, Junjie
    Wang, Wendy Hui
    Shi, Xinghua
    PACIFIC SYMPOSIUM ON BICOMPUTING 2021, 2021, : 26 - 37
  • [34] New regression model and machine learning for fitting proportional data with application
    Rodrigues, Gabriela M.
    Cordeiro, Gauss M.
    Ortega, Edwin M. M.
    Vila, Roberto
    STATISTICS, 2025, 59 (02) : 498 - 516
  • [35] Research on Governmental Data Sharing Based on Local Differential Privacy Approach
    Liu, Liping
    Piao, Chunhui
    Jiang, Xuehong
    Zheng, Lijuan
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2018), 2018, : 39 - 45
  • [36] Performance Evaluation of Regression-Based Machine Learning Models for Modeling Reference Evapotranspiration with Temperature Data
    Diamantopoulou, Maria J.
    Papamichail, Dimitris M.
    HYDROLOGY, 2024, 11 (07)
  • [37] Privacy-preserving data mining and machine learning in healthcare: Applications, challenges, and solutions
    Naresh, Vankamamidi S.
    Thamarai, Muthusamy
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (02)
  • [38] A new clustering algorithm based on a radar scanning strategy with applications to machine learning data
    Ma, Lin
    Zhang, Yi
    Leiva, Victor
    Liu, Shuangzhe
    Ma, Tiefeng
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [39] Local differential privacy federated learning based on heterogeneous data multi-privacy mechanism
    Wang, Jie
    Zhang, Zhiju
    Tian, Jing
    Li, Hongtao
    COMPUTER NETWORKS, 2024, 254
  • [40] A Constrained Optimization based Extreme Learning Machine for noisy data regression
    Wong, Shen Yuong
    Yap, Keem Siah
    Yap, Hwa Jen
    NEUROCOMPUTING, 2016, 171 : 1431 - 1443