Differentially Private n-gram Extraction

被引：0

作者：

Kim, Kunho ^{[1
]}

Gopi, Sivakanth ^{[2
]}

Kulkarni, Janardhan ^{[2
]}

Yekhanin, Sergey ^{[2
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

[2] Microsoft Res, Redmond, WA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].

引用

页数：10

共 50 条

[21] A new N-gram feature extraction-selection method for malicious code
School of Computer Engineering, Iran University of Science and Technology , Tehran, Iran
Lect. Notes Comput. Sci., PART 2 (98-107):
[22] The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification
Setiawan Y.
Maulidevi N.U.
Surendro K.
Data Science Journal, 2024, 23 (01)
[23] n-gram Cache Performance in Statistical Extraction of Relevant Terms in Large Corpora
Goncalves, Carlos
Silva, Joaquim F.
Cunha, Jose C.
COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 : 75 - 88
[24] A New N-gram Feature Extraction-Selection Method for Malicious Code
Parvin, Hamid
Minaei, Behrouz
Karshenas, Hossein
Beigi, Akram
ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT II, 2011, 6594 : 98 - 107
[25] Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification
Wang, Felix
Quach, Tu-Thach
Wheeler, Jason
Aimone, James B.
James, Conrad D.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (10) : 2553 - 2562
[26] News Thread Extraction Based on Topical N-Gram Model with a Background Distribution
Yan, Zehua
Li, Fang
NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 416 - 424
[27] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
Bayrak Hayta, Sengul
Takci, Hidayet
Eminli, Mubariz
ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
[28] Chinese keyword extraction based on N-gram and word co-occurrence
Jiao, Hui
Liu, Qian
Jia, Hui-bo
CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 152 - +
[29] Feature Extraction and Cluster Analysis Using N-gram Statistics for DAIHINMIN Programs
Okubo, Seiya
Ayabe, Takaaki
Nishino, Tetsuro
APPLIED COMPUTING & INFORMATION TECHNOLOGY, 2016, 619 : 27 - 41
[30] N-gram approach for gender prediction
Reddy, T. Raghunadha
Vardhan, B. Vishnu
Reddy, P. Vijayapal
2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 860 - 865

← 1 2 3 4 5 →