Differentially Private n-gram Extraction

被引:0
|
作者
Kim, Kunho [1 ]
Gopi, Sivakanth [2 ]
Kulkarni, Janardhan [2 ]
Yekhanin, Sergey [2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Microsoft Res, Redmond, WA USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A new N-gram feature extraction-selection method for malicious code
    School of Computer Engineering, Iran University of Science and Technology , Tehran, Iran
    Lect. Notes Comput. Sci., PART 2 (98-107):
  • [22] The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification
    Setiawan Y.
    Maulidevi N.U.
    Surendro K.
    Data Science Journal, 2024, 23 (01)
  • [23] n-gram Cache Performance in Statistical Extraction of Relevant Terms in Large Corpora
    Goncalves, Carlos
    Silva, Joaquim F.
    Cunha, Jose C.
    COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 : 75 - 88
  • [24] A New N-gram Feature Extraction-Selection Method for Malicious Code
    Parvin, Hamid
    Minaei, Behrouz
    Karshenas, Hossein
    Beigi, Akram
    ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, PT II, 2011, 6594 : 98 - 107
  • [25] Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification
    Wang, Felix
    Quach, Tu-Thach
    Wheeler, Jason
    Aimone, James B.
    James, Conrad D.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (10) : 2553 - 2562
  • [26] News Thread Extraction Based on Topical N-Gram Model with a Background Distribution
    Yan, Zehua
    Li, Fang
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 416 - 424
  • [27] LANGUAGE IDENTIFICATION BASED ON N-GRAM FEATURE EXTRACTION METHOD BY USING CLASSIFIERS
    Bayrak Hayta, Sengul
    Takci, Hidayet
    Eminli, Mubariz
    ISTANBUL UNIVERSITY-JOURNAL OF ELECTRICAL AND ELECTRONICS ENGINEERING, 2013, 13 (02): : 1629 - 1638
  • [28] Chinese keyword extraction based on N-gram and word co-occurrence
    Jiao, Hui
    Liu, Qian
    Jia, Hui-bo
    CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 152 - +
  • [29] Feature Extraction and Cluster Analysis Using N-gram Statistics for DAIHINMIN Programs
    Okubo, Seiya
    Ayabe, Takaaki
    Nishino, Tetsuro
    APPLIED COMPUTING & INFORMATION TECHNOLOGY, 2016, 619 : 27 - 41
  • [30] N-gram approach for gender prediction
    Reddy, T. Raghunadha
    Vardhan, B. Vishnu
    Reddy, P. Vijayapal
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 860 - 865