Differentially Private n-gram Extraction

被引:0
|
作者
Kim, Kunho [1 ]
Gopi, Sivakanth [2 ]
Kulkarni, Janardhan [2 ]
Yekhanin, Sergey [2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Microsoft Res, Redmond, WA USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Classification of facemarks using N-gram
    Yamada, Thichi
    Tsuchiya, Seiji
    Kuroiwa, Shiongo
    Ren, Fuji
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 322 - +
  • [32] Distributing N-Gram Graphs for Classification
    Kontopoulos, Ioannis
    Giannakopoulos, George
    Varlamis, Iraklis
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 3 - 11
  • [33] SEARCHING FOR TEXT - SEND AN N-GRAM
    KIMBRELL, RE
    BYTE, 1988, 13 (05): : 297 - &
  • [34] Text mining with n-gram variables
    Schonlau, Matthias
    Guenther, Nick
    Sucholutsky, Ilia
    STATA JOURNAL, 2017, 17 (04): : 866 - 881
  • [35] Uniquely decodable n-gram embeddings
    Kontorovich, L
    THEORETICAL COMPUTER SCIENCE, 2004, 329 (1-3) : 271 - 284
  • [36] Semantic N-Gram Topic Modeling
    Kherwa, Pooja
    Bansal, Poonam
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (26) : 1 - 12
  • [37] N-gram Analysis of a Mongolian Text
    Altangerel, Khuder
    Tsend, Ganbat
    Jalsan, Khash-Erdene
    IFOST 2008: PROCEEDING OF THE THIRD INTERNATIONAL FORUM ON STRATEGIC TECHNOLOGIES, 2008, : 258 - 259
  • [38] On compressing n-gram language models
    Hirsimaki, Teemu
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 949 - 952
  • [39] N-GRAM ANALYSIS IN THE ENGINEERING DOMAIN
    Leary, Martin
    Pearson, Geoff
    Burvill, Colin
    Mazur, Maciej
    Subic, Aleksandar
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 11): IMPACTING SOCIETY THROUGH ENGINEERING DESIGN, VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2011, 6 : 414 - 423
  • [40] Supervised N-gram Topic Model
    Kawamae, Noriaki
    WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 473 - 482