Differentially Private n-gram Extraction

被引：0

作者：

Kim, Kunho ^{[1
]}

Gopi, Sivakanth ^{[2
]}

Kulkarni, Janardhan ^{[2
]}

Yekhanin, Sergey ^{[2
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

[2] Microsoft Res, Redmond, WA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].

引用

页数：10

共 50 条

[41] Discriminative n-gram language modeling
Roark, Brian
Saraclar, Murat
Collins, Michael
COMPUTER SPEECH AND LANGUAGE, 2007, 21 (02): : 373 - 392
[42] Similar N-gram Language Model
Gillot, Christian
Cerisara, Christophe
Langlois, David
Haton, Jean-Paul
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
[43] Croatian Language N-Gram System
Dembitz, Sandor
Blaskovic, Bruno
Gledec, Gordan
ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 696 - 705
[44] Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus
Alsmadi, Izzat
Zarour, Mohammad
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (05) : 785 - 794
[45] Towards Competitive N-gram Smoothing
Falahatgar, Moein
Ohannessian, Mesrob
Orlitsky, Alon
Pichapati, Venkatadheeraj
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 4206 - 4214
[46] Semantic relation extraction aware of N-gram features from unstructured biomedical text
Wang, Zheng
Xu, Shuo
Zhu, Lijun
JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 86 : 59 - 70
[47] A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction
Kumar, Niraj
Srinathan, Kannan
Varma, Vasudeva
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (02) : 124 - 143
[48] Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique
Kumar, Niraj
Srinathan, Kannan
DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 199 - 208
[49] NgramQuery - Smart Information Extraction from Google N-gram using External Resources
Aleksandrov, Martin
Strapparava, Carlo
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 563 - 568
[50] NOVEL TOPIC N-GRAM COUNT LM INCORPORATING DOCUMENT-BASED TOPIC DISTRIBUTIONS AND N-GRAM COUNTS
Haidar, Md. Akmal
O'Shaughnessy, Douglas
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2310 - 2314

← 1 2 3 4 5 →