Differentially Private n-gram Extraction

被引：0

作者：

Kim, Kunho ^{[1
]}

Gopi, Sivakanth ^{[2
]}

Kulkarni, Janardhan ^{[2
]}

Yekhanin, Sergey ^{[2
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

[2] Microsoft Res, Redmond, WA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].

引用

页数：10

共 50 条

[1] DERIN: A data extraction information and n-gram
Lopes Figueiredo, Leandro Neiva
de Assis, Guilherme Tavares
Ferreira, Anderson A.
INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1120 - 1138
[2] An N-Gram Based Method for Bengali Keyphrase Extraction
Sarkar, Kamal
INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
[3] Advanced Information Extraction with n-gram based LSI
Guven, Ahmet
Bozkurt, O. Ozgur
Kalipsiz, Oya
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 13 - 18
[4] N-gram Insight
Prans, George
AMERICAN SCIENTIST, 2011, 99 (05) : 356 - 357
[5] Teraman: A tool for n-gram extraction from large datasets
Ceska, Zdenek
Hanak, Ivo
Tesar, Roman
ICCP 2007: IEEE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING, PROCEEDINGS, 2007, : 209 - +
[6] Comparison of Distributed Computing Approaches to Complexity of n-gram Extraction
Aubakirov, Sanzhar
Trigo, Paulo
Ahmed-Zaki, Darhan
DATA: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS, 2016, : 25 - 30
[7] Regularized Subspace n-Gram Model for Phonotactic iVector Extraction
Soufifar, Mehdi
Burget, Lukas
Plchot, Oldrich
Cumani, Sandro
Cernocky, Jan
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 74 - 78
[8] N-gram MalGAN: Evading machine learning detection via feature n-gram
Zhu, Enmin
Zhang, Jianjie
Yan, Jijie
Chen, Kongyang
Gao, Chongzhi
DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (04) : 485 - 491
[9] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
Zhou, Zhengyu
Meng, Helen
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952
[10] Pipilika N-gram Viewer: An Efficient Large Scale N-gram Model for Bengali
Ahmad, Adnan
Talha, Mahbubur Rub
Amin, Md. Ruhul
Chowdhury, Farida
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,

← 1 2 3 4 5 →