Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

被引:64
|
作者
Barron-Cedeno, Alberto [1 ]
Vila, Marta [2 ]
Antonia Marti, M. [2 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, ES-08034 Barcelona, Spain
[2] Univ Barcelona, Dept Linguist, CLiC, E-08007 Barcelona, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Dept Informat Syst & Computat, Valencia 46022, Spain
关键词
D O I
10.1162/COLI_a_00153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.
引用
收藏
页码:917 / 948
页数:32
相关论文
共 50 条
  • [1] A Graph Based Automatic Plagiarism Detection Technique to Handle Artificial Word Reordering and Paraphrasing
    Kumar, Niraj
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 481 - 494
  • [2] Automatic Generation of Summary Obfuscation Corpus for Plagiarism Detection
    Miranda-Jimenez, Sabino
    Stamatatos, Efstathios
    [J]. ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 99 - 112
  • [3] Automatic generation of plagiarism detection among student programs
    Roxas, Rachel Edita
    Lim, Nathalie Rose
    Bautista, Natasja
    [J]. 2006 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING, VOLS 1 AND 2, 2006, : 244 - 253
  • [4] Automatic plagiarism detection in obfuscated text
    Alaa Saleh Altheneyan
    Mohamed El Bachir Menai
    [J]. Pattern Analysis and Applications, 2020, 23 : 1627 - 1650
  • [5] Automatic plagiarism detection in obfuscated text
    Altheneyan, Alaa Saleh
    Menai, Mohamed El Bachir
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (04) : 1627 - 1650
  • [6] Automatic Source Code Plagiarism Detection
    Kustanto, Cynthia
    Liem, Inggriani
    [J]. SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 481 - 486
  • [7] USING PARAPHRASING CARDS TO REDUCE UNINTENTIONAL PLAGIARISM
    STAHL, N
    KING, JR
    [J]. JOURNAL OF READING, 1991, 34 (07): : 562 - 563
  • [8] Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution
    Cebrian, Manuel
    Alfonseca, Manuel
    Ortega, Alfonso
    [J]. GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 2253 - 2253
  • [9] Plagiarism and paraphrasing criteria of college and university professors
    Roig, M
    [J]. ETHICS & BEHAVIOR, 2001, 11 (03) : 307 - 323
  • [10] A linguistic treatment for automatic external plagiarism detection
    Abdi, Asad
    Shamsuddin, Siti Mariyam
    Idris, Norisma
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 135 : 135 - 146