Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks

被引:6
|
作者
Thomas, Aleena [1 ]
Adelani, David Ifeoluwa [1 ]
Davody, Ali [1 ]
Mogadala, Aditya [1 ]
Klakow, Dietrich [1 ]
机构
[1] Saarland Univ, Spoken Language Syst Grp, Saarland Informat Campus, Saarbrucken, Germany
来源
关键词
Differential privacy; Word representations; Unintended memorization;
D O I
10.1007/978-3-030-58323-1_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information.
引用
收藏
页码:273 / 281
页数:9
相关论文
共 50 条
  • [1] The impact of using pre-trained word embeddings in Sinhala chatbots
    Gamage, Bimsara
    Pushpananda, Randil
    Weerasinghe, Ruvan
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 161 - 165
  • [2] Improving the accuracy using pre-trained word embeddings on deep neural networks for Turkish text classification
    Aydogan, Murat
    Karci, Ali
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 541
  • [3] Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media
    Albalawi, Yahya
    Buckley, Jim
    Nikolov, Nikola S.
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [4] Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media
    Yahya Albalawi
    Jim Buckley
    Nikola S. Nikolov
    [J]. Journal of Big Data, 8
  • [5] Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings
    Jaber, Areej
    Martinez, Paloma
    [J]. HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 501 - 508
  • [6] Sentiment analysis based on improved pre-trained word embeddings
    Rezaeinia, Seyed Mahdi
    Rahmani, Rouhollah
    Ghodsi, Ali
    Veisi, Hadi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 139 - 147
  • [7] Embodying Pre-Trained Word Embeddings Through Robot Actions
    Toyoda, Minori
    Suzuki, Kanata
    Mori, Hiroki
    Hayashi, Yoshihiko
    Ogata, Tetsuya
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 4225 - 4232
  • [8] Gender-preserving Debiasing for Pre-trained Word Embeddings
    Kaneko, Masahiro
    Bollegala, Danushka
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1641 - 1650
  • [9] Dictionary-based Debiasing of Pre-trained Word Embeddings
    Kaneko, Masahiro
    Bollegala, Danushka
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 212 - 223
  • [10] An Ensemble Approach of Recurrent Neural Networks using Pre-Trained Embeddings for Playlist Completion
    Monti, Diego
    Palumbo, Enrico
    Rizzo, Giuseppe
    Lisena, Pasquale
    Troncy, Raphael
    Fell, Michael
    Cabrio, Elena
    Morisio, Maurizio
    [J]. RECSYS CHALLENGE'18: PROCEEDINGS OF THE ACM RECOMMENDER SYSTEMS CHALLENGE 2018, 2018,