Differentially Private n-gram Extraction

被引:0
|
作者
Kim, Kunho [1 ]
Gopi, Sivakanth [2 ]
Kulkarni, Janardhan [2 ]
Yekhanin, Sergey [2 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We revisit the problem of n-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many n-grams as possible while preserving user level privacy. Extracting n-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012) [CAC12].
引用
收藏
页数:10
相关论文
共 50 条
  • [1] DERIN: A data extraction information and n-gram
    Lopes Figueiredo, Leandro Neiva
    de Assis, Guilherme Tavares
    Ferreira, Anderson A.
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1120 - 1138
  • [2] An N-Gram Based Method for Bengali Keyphrase Extraction
    Sarkar, Kamal
    INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 36 - 41
  • [3] Advanced Information Extraction with n-gram based LSI
    Guven, Ahmet
    Bozkurt, O. Ozgur
    Kalipsiz, Oya
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 17, 2006, 17 : 13 - 18
  • [4] N-gram Insight
    Prans, George
    AMERICAN SCIENTIST, 2011, 99 (05) : 356 - 357
  • [5] Teraman: A tool for n-gram extraction from large datasets
    Ceska, Zdenek
    Hanak, Ivo
    Tesar, Roman
    ICCP 2007: IEEE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING, PROCEEDINGS, 2007, : 209 - +
  • [6] Comparison of Distributed Computing Approaches to Complexity of n-gram Extraction
    Aubakirov, Sanzhar
    Trigo, Paulo
    Ahmed-Zaki, Darhan
    DATA: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS, 2016, : 25 - 30
  • [7] Regularized Subspace n-Gram Model for Phonotactic iVector Extraction
    Soufifar, Mehdi
    Burget, Lukas
    Plchot, Oldrich
    Cumani, Sandro
    Cernocky, Jan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 74 - 78
  • [8] N-gram MalGAN: Evading machine learning detection via feature n-gram
    Zhu, Enmin
    Zhang, Jianjie
    Yan, Jijie
    Chen, Kongyang
    Gao, Chongzhi
    DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (04) : 485 - 491
  • [9] Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR
    Zhou, Zhengyu
    Meng, Helen
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 943 - 952
  • [10] Pipilika N-gram Viewer: An Efficient Large Scale N-gram Model for Bengali
    Ahmad, Adnan
    Talha, Mahbubur Rub
    Amin, Md. Ruhul
    Chowdhury, Farida
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,