Differentially Private n-gram Extraction

被引:0
|
作者
Kim, Kunho [1 ]
Gopi, Sivakanth [2 ]
Kulkarni, Janardhan [2 ]
Yekhanin, Sergey [2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Discriminative n-gram language modeling
    Roark, Brian
    Saraclar, Murat
    Collins, Michael
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 373 - 392
  • [42] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [43] Croatian Language N-Gram System
    Dembitz, Sandor
    Blaskovic, Bruno
    Gledec, Gordan
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 696 - 705
  • [44] Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus
    Alsmadi, Izzat
    Zarour, Mohammad
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (05) : 785 - 794
  • [45] Towards Competitive N-gram Smoothing
    Falahatgar, Moein
    Ohannessian, Mesrob
    Orlitsky, Alon
    Pichapati, Venkatadheeraj
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4206 - 4214
  • [46] Semantic relation extraction aware of N-gram features from unstructured biomedical text
    Wang, Zheng
    Xu, Shuo
    Zhu, Lijun
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 86 : 59 - 70
  • [47] A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction
    Kumar, Niraj
    Srinathan, Kannan
    Varma, Vasudeva
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (02) : 124 - 143
  • [48] Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique
    Kumar, Niraj
    Srinathan, Kannan
    DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 199 - 208
  • [49] NgramQuery - Smart Information Extraction from Google N-gram using External Resources
    Aleksandrov, Martin
    Strapparava, Carlo
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 563 - 568
  • [50] NOVEL TOPIC N-GRAM COUNT LM INCORPORATING DOCUMENT-BASED TOPIC DISTRIBUTIONS AND N-GRAM COUNTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2310 - 2314