Towards Effective Paraphrasing for Information Disguise

被引：2

作者：

Agarwal, Anmol ^{[1
]}

Gupta, Shrey ^{[1
]}

Bonagiri, Vamshi ^{[1
]}

Gaur, Manas ^{[2
]}

Reagle, Joseph ^{[3
]}

Kumaraguru, Ponnurangam ^{[1
]}

机构：

[1] Int Inst Informat Technol, Hyderabad, India

[2] Univ Maryland, Baltimore, MD USA

[3] Northeastern Univ, Boston, MA USA

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II | 2023年 / 13981卷

关键词：

Neural information retrieval; Adversarial retrieval; Paraphrasing; Information disguise; Computational ethics;

D O I：

10.1007/978-3-031-28238-6_22

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Information Disguise (ID), a part of computational ethics in Natural Language Processing (NLP), is concerned with best practices of textual paraphrasing to prevent the non-consensual use of authors' posts on the Internet. Research on ID becomes important when authors' written online communication pertains to sensitive domains, e.g., mental health. Over time, researchers have utilized AI-based automated word spinners (e.g., SpinRewriter, WordAI) for paraphrasing content. However, these tools fail to satisfy the purpose of ID as their paraphrased content still leads to the source when queried on search engines. There is limited prior work on judging the effectiveness of paraphrasing methods for ID on search engines or their proxies, neural retriever (NeurIR) models. We propose a framework where, for a given sentence from an author's post, we perform iterative perturbation on the sentence in the direction of paraphrasing with an attempt to confuse the search mechanism of a NeurIR system when the sentence is queried on it. Our experiments involve the subreddit "r/AmItheAsshole" as the source of public content and Dense Passage Retriever as a NeurIR system-based proxy for search engines. Our work introduces a novel method of phrase-importance rankings using perplexity scores and involves multilevel phrase substitutions via beam search. Our multi-phrase substitution scheme succeeds in disguising sentences 82% of the time and hence takes an essential step towards enabling researchers to disguise sensitive content effectively before making it public. We also release the code of our approach. (https://github.com/idecir/idecir-Towards-Effective-Paraphrasing-for-Information-Disguise)

引用

页码：331 / 340

页数：10

共 50 条

[21] Mechanochemistry: A force in disguise and conditional effects towards chemical reactions
Mateti, Srikanth
Mathesh, Motilal
Liu, Zhen
Tao, Tao
Ramireddy, Thrinathreddy
Glushenkov, Alexey M.
Yang, Wenrong
Chen, Ying Ian
CHEMICAL COMMUNICATIONS, 2021, 57 (09) : 1080 - 1092
[22] Deriving and paraphrasing information grammars using object-oriented analysis models
Frederiks, PJM
van der Weide, TP
ACTA INFORMATICA, 2002, 38 (07) : 437 - 488
[23] Hiding Secret Information By Automatically Paraphrasing Modern Greek Text With Minimal Resources
Kermanidis, Katia Lida
22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 379 - 380
[24] Towards effective genomic information retrieval: The impact of query complexity and expansion strategies
Mu, Xiangming
Lu, Kun
JOURNAL OF INFORMATION SCIENCE, 2010, 36 (02) : 194 - 208
[25] Designing an Effective Collaboration using Information Technology Towards World Class University
Angreani, Linda Salma
Vijaya, Annas
4TH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE (ISICO 2017), 2017, 124 : 577 - 584
[26] Towards Effective Modeling and Exploitation of Search and User Context in Conversational Information Retrieval
Acharya, Praveen
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5161 - 5164
[27] Problem taxonomy: a step towards effective information sharing in supply chain management
Chandra, Charu
Grabis, Janis
Tumanyan, Armen
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2007, 45 (11) : 2507 - 2544
[28] Towards An Effective Secret Key Generation Scheme for Imperfect Channel State Information
Cheng, Longwang
Li, Wei
Ma, Dongtang
Zhou, Li
Zhu, Chunsheng
Wei, Jibo
2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 915 - 920
[29] Towards an Effective and Efficient Management of Genome Data: An Information Systems Engineering Perspective
Alberto Garcia, S.
Reyes Roman, Jose Fabian
Carlos Casamayor, Juan
Pastor, Oscar
INFORMATION SYSTEMS ENGINEERING IN RESPONSIBLE INFORMATION SYSTEMS, CAISE FORUM 2019, 2019, 350 : 99 - 110
[30] Librarians as writing instructors: Using paraphrasing exercises to teach beginning information literacy students
Bronshtyn, Karen
Baladad, Rita
JOURNAL OF ACADEMIC LIBRARIANSHIP, 2006, 32 (05): : 533 - 536

← 1 2 3 4 5 →