A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data

被引:6
|
作者
Manchini, Carlos [1 ]
Ospina, Raydonal [1 ,2 ]
Leiva, Victor [3 ]
Martin-Barreiro, Carlos [4 ,5 ]
机构
[1] Univ Fed Pernambuco, Dept Stat, CASTLab, Recife, Brazil
[2] Univ Fed Bahia, Dept Estat, IME, Salvador, Brazil
[3] Pontifica Univ Catolica Valparaiso, Sch Ind Engn, Valparaiso, Chile
[4] Escuela Super Politecn Litoral ESPOL, Fac Nat Sci & Math, Guayaquil, Ecuador
[5] Univ Espiritu Santo, Fac Engn, Samborondon, Ecuador
关键词
Anonymity; Confidentiality; Data breach and fitting; Linear and logistic regressions; Monte Carlo simulation; Perturbations of data; Statistical consistency and modeling; HETEROSKEDASTICITY; ESTIMATOR; INFERENCE;
D O I
10.1016/j.ins.2022.10.076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generation of massive data in the digital age leads to possible violations of individual pri-vacy. The search for personal data becomes an increasingly recurrent exposure today. The present work corresponds to the area of differential privacy, which guarantees data confi-dentiality and robustness against invasive identification attacks. This area stands out in the literature for its rigorous mathematical basis capable of quantifying the loss of privacy. A differentially private method based on regression models was developed to prevent inver-sion attacks while retaining model efficacy. In this paper, we propose a novel approach to improve the data privacy based on regression models under heteroscedasticity, a common aspect, but not studied, in practical situations of differential privacy. The influence of pri-vacy restriction on the statistical performance of the estimators of model parameters is evaluated using Monte Carlo simulations, including a study of performance associated with test rejection rates for the proposed approach. The results of the numerical evaluation show high inferential distortion for stricter privacy restrictions. Empirical illustrations with real-world data are presented to show potential applications.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:280 / 300
页数:21
相关论文
共 50 条
  • [41] New approach for predicting nitrogen and pigments in maize from hyperspectral data and machine learning models
    da Silva, Bianca Cavalcante
    Prado, Renato de Mello
    Baio, Fabio Henrique Rojo
    Campos, Cid Naudi Silva
    Teodoro, Larissa Pereira Ribeiro
    Teodoro, Paulo Eduardo
    Santana, Dthenifer Cordeiro
    Fernandes, Thiago Feliph Silva
    da Silva Jr, Carlos Antonio
    Loureiro, Elisangela de Souza
    REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT, 2024, 33
  • [42] LIBOR meets machine learning: A Lasso regression approach to detecting data irregularities
    Pontines, Victor
    Rummel, Ole
    FINANCE RESEARCH LETTERS, 2023, 55
  • [43] A Hybrid Intrusion Detection System Based on Machine Learning under Differential Privacy Protection
    Shi, Jibo
    Lin, Yun
    Zhang, Zherui
    Yu, Shui
    2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
  • [44] Machine Learning Applications for Site Characterization Based on CPT Data
    Tsiaousi, Dimitra
    Travasarou, Thaleia
    Drosos, Vasilis
    Ugalde, Jose
    Chacko, Jacob
    GEOTECHNICAL EARTHQUAKE ENGINEERING AND SOIL DYNAMICS V: SLOPE STABILITY AND LANDSLIDES, LABORATORY TESTING, AND IN SITU TESTING, 2018, (293): : 461 - 472
  • [45] Application of machine learning based models in computer network data
    Liu H.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [46] A New Approach to Data Analysis Using Machine Learning for Cybersecurity
    Hiremath, Shivashankar
    Shetty, Eeshan
    Prakash, Allam Jaya
    Sahoo, Suraj Prakash
    Patro, Kiran Kumar
    Rajesh, Kandala N. V. P. S.
    Plawiak, Pawel
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (04)
  • [47] A probabilistic approach to training machine learning models using noisy data
    Alzraiee, Ayman H.
    Niswonger, Richard G.
    ENVIRONMENTAL MODELLING & SOFTWARE, 2024, 179
  • [48] Privacy-Preserving Machine Learning Based Data Analytics on Edge Devices
    Zhao, Jianxin
    Mortier, Richard
    Crowcroft, Jon
    Wang, Liang
    PROCEEDINGS OF THE 2018 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY (AIES'18), 2018, : 341 - 346
  • [49] A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications
    Ataie, Ehsan
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Ardagna, Danilo
    COMPUTER JOURNAL, 2022, 65 (12): : 3123 - 3140
  • [50] Machine learning approach for power consumption model based on monsoon data for smart cities applications
    Sheik Mohideen Shah, S.
    Meganathan, S.
    COMPUTATIONAL INTELLIGENCE, 2021, 37 (03) : 1309 - 1321