Approximate Data Deletion from Machine Learning Models

被引:0
|
作者
Izzo, Zachary [1 ]
Smart, Mary Anne [2 ]
Chaudhuri, Kamalika [2 ]
Zou, James [3 ]
机构
[1] Stanford Univ, Dept Math, Stanford, CA 94305 USA
[2] Univ Calif San Diego, Dept CS&E, San Diego, CA USA
[3] Stanford Univ, Dept BDS, Stanford, CA 94305 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU's General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. In this work, we propose a new approximate deletion method for linear and logistic models whose computational cost is linear in the the feature dimension d and in-dependent of the number of training data n. This is a significant gain over all existing methods, which all have superlinear time dependence on the dimension. We also develop a new feature-injection test to evaluate the thoroughness of data deletion from ML models.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Certified Data Removal from Machine Learning Models
    Guo, Chuan
    Goldstein, Tom
    Hannun, Awni
    van der Maaten, Laurens
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [2] Certified Data Removal from Machine Learning Models
    Guo, Chuan
    Goldstein, Tom
    Hannun, Awni
    van der Maaten, Laurens
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] On Efficient Approximate Queries over Machine Learning Models
    Ding, Dujian
    Amer-Yahia, Sihem
    Lakshmanan, Laks
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (04): : 918 - 931
  • [4] Learning from approximate data
    Cheung, S
    COMPUTING AND COMBINATORICS, PROCEEDINGS, 2000, 1858 : 407 - 415
  • [5] Learning EPON delay models from data: a machine learning approach
    Alberto Hernandez, Jose
    Ebrahimzadeh, Amin
    Maier, Martin
    Larrabeiti, David
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2021, 13 (12) : 322 - 330
  • [6] Approximate Imputation Method for Missing Data in Machine Learning
    Cao W.
    Chu Y.
    Li X.
    1600, Xi'an Jiaotong University (51): : 142 - 148
  • [7] Generating models of mental retardation from data with machine learning
    Mani, S
    McDermott, S
    Pazzani, MJ
    1997 IEEE KNOWLEDGE AND DATA ENGINEERING EXCHANGE WORKSHOP, PROCEEDINGS, 1997, : 114 - 119
  • [8] Information Leakage from Data Updates in Machine Learning Models
    Hui, Tian
    Farokhi, Farhad
    Ohrimenko, Olga
    PROCEEDINGS OF THE 16TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2023, 2023, : 35 - 41
  • [9] Stealing Your Data from Compressed Machine Learning Models
    Xu, Nuo
    Liu, Qi
    Liu, Tao
    Liu, Zihao
    Guo, Xiaochen
    Wen, Wujie
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [10] Towards Machine Learning of Predictive Models from Ecological Data
    Tamaddoni-Nezhad, Alireza
    Bohan, David
    Raybould, Alan
    Muggleton, Stephen
    INDUCTIVE LOGIC PROGRAMMING, ILP 2014, 2015, 9046 : 154 - 167