Towards Effective Paraphrasing for Information Disguise

被引:2
|
作者
Agarwal, Anmol [1 ]
Gupta, Shrey [1 ]
Bonagiri, Vamshi [1 ]
Gaur, Manas [2 ]
Reagle, Joseph [3 ]
Kumaraguru, Ponnurangam [1 ]
机构
[1] Int Inst Informat Technol, Hyderabad, India
[2] Univ Maryland, Baltimore, MD USA
[3] Northeastern Univ, Boston, MA USA
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II | 2023年 / 13981卷
关键词
Neural information retrieval; Adversarial retrieval; Paraphrasing; Information disguise; Computational ethics;
D O I
10.1007/978-3-031-28238-6_22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information Disguise (ID), a part of computational ethics in Natural Language Processing (NLP), is concerned with best practices of textual paraphrasing to prevent the non-consensual use of authors' posts on the Internet. Research on ID becomes important when authors' written online communication pertains to sensitive domains, e.g., mental health. Over time, researchers have utilized AI-based automated word spinners (e.g., SpinRewriter, WordAI) for paraphrasing content. However, these tools fail to satisfy the purpose of ID as their paraphrased content still leads to the source when queried on search engines. There is limited prior work on judging the effectiveness of paraphrasing methods for ID on search engines or their proxies, neural retriever (NeurIR) models. We propose a framework where, for a given sentence from an author's post, we perform iterative perturbation on the sentence in the direction of paraphrasing with an attempt to confuse the search mechanism of a NeurIR system when the sentence is queried on it. Our experiments involve the subreddit "r/AmItheAsshole" as the source of public content and Dense Passage Retriever as a NeurIR system-based proxy for search engines. Our work introduces a novel method of phrase-importance rankings using perplexity scores and involves multilevel phrase substitutions via beam search. Our multi-phrase substitution scheme succeeds in disguising sentences 82% of the time and hence takes an essential step towards enabling researchers to disguise sensitive content effectively before making it public. We also release the code of our approach. (https://github.com/idecir/idecir-Towards-Effective-Paraphrasing-for-Information-Disguise)
引用
收藏
页码:331 / 340
页数:10
相关论文
共 50 条
  • [21] Mechanochemistry: A force in disguise and conditional effects towards chemical reactions
    Mateti, Srikanth
    Mathesh, Motilal
    Liu, Zhen
    Tao, Tao
    Ramireddy, Thrinathreddy
    Glushenkov, Alexey M.
    Yang, Wenrong
    Chen, Ying Ian
    CHEMICAL COMMUNICATIONS, 2021, 57 (09) : 1080 - 1092
  • [22] Deriving and paraphrasing information grammars using object-oriented analysis models
    Frederiks, PJM
    van der Weide, TP
    ACTA INFORMATICA, 2002, 38 (07) : 437 - 488
  • [23] Hiding Secret Information By Automatically Paraphrasing Modern Greek Text With Minimal Resources
    Kermanidis, Katia Lida
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 379 - 380
  • [24] Towards effective genomic information retrieval: The impact of query complexity and expansion strategies
    Mu, Xiangming
    Lu, Kun
    JOURNAL OF INFORMATION SCIENCE, 2010, 36 (02) : 194 - 208
  • [25] Designing an Effective Collaboration using Information Technology Towards World Class University
    Angreani, Linda Salma
    Vijaya, Annas
    4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 577 - 584
  • [26] Towards Effective Modeling and Exploitation of Search and User Context in Conversational Information Retrieval
    Acharya, Praveen
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5161 - 5164
  • [27] Problem taxonomy: a step towards effective information sharing in supply chain management
    Chandra, Charu
    Grabis, Janis
    Tumanyan, Armen
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2007, 45 (11) : 2507 - 2544
  • [28] Towards An Effective Secret Key Generation Scheme for Imperfect Channel State Information
    Cheng, Longwang
    Li, Wei
    Ma, Dongtang
    Zhou, Li
    Zhu, Chunsheng
    Wei, Jibo
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 915 - 920
  • [29] Towards an Effective and Efficient Management of Genome Data: An Information Systems Engineering Perspective
    Alberto Garcia, S.
    Reyes Roman, Jose Fabian
    Carlos Casamayor, Juan
    Pastor, Oscar
    INFORMATION SYSTEMS ENGINEERING IN RESPONSIBLE INFORMATION SYSTEMS, CAISE FORUM 2019, 2019, 350 : 99 - 110
  • [30] Librarians as writing instructors: Using paraphrasing exercises to teach beginning information literacy students
    Bronshtyn, Karen
    Baladad, Rita
    JOURNAL OF ACADEMIC LIBRARIANSHIP, 2006, 32 (05): : 533 - 536