Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

被引:64
|
作者
Barron-Cedeno, Alberto [1 ]
Vila, Marta [2 ]
Antonia Marti, M. [2 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, ES-08034 Barcelona, Spain
[2] Univ Barcelona, Dept Linguist, CLiC, E-08007 Barcelona, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Dept Informat Syst & Computat, Valencia 46022, Spain
关键词
D O I
10.1162/COLI_a_00153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.
引用
收藏
页码:917 / 948
页数:32
相关论文
共 50 条
  • [31] Multilingual plagiarism detection
    Ceska, Zdenek
    Toman, Michal
    Jezek, Karel
    [J]. ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, 2008, 5253 : 83 - 92
  • [32] Investigating academic plagiarism: A forensic linguistics approach to plagiarism detection
    Sousa-Silva, Rui
    [J]. INTERNATIONAL JOURNAL FOR EDUCATIONAL INTEGRITY, 2014, 10 (01): : 31 - 41
  • [33] On Automatic Plagiarism Detection Based on n-Grams Comparison
    Barron-Cedeno, Alberto
    Rosso, Paolo
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 696 - 700
  • [34] Intrinsic plagiarism detection
    Eissen, Sven Meyer zu
    Stein, Benno
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2006, 3936 : 565 - 569
  • [35] Plagiarism through Paraphrasing Tools-The Story of One Plagiarized Text
    Ansorge, Libor
    Ansorgeova, Klara
    Sixsmith, Mark
    [J]. PUBLICATIONS, 2021, 9 (04)
  • [36] AuDeNTES: Automatic detection of teNtative plagiarism according to a rEference solution
    Department of Informatics, Systems and Communication, University of Milano Bicocca, viale Sarca 336, 20126 Milano, Italy
    [J]. ACM J. Trans. Comput. Educ., 2012, 1
  • [37] When the plagiarism of instructors meets copyright law
    Holme, Thomas
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2009, 237
  • [38] Similarity, does it necessary mean plagiarism? Stop intentional and exaggerated paraphrasing
    Shohda, Eslam Elsayed Ali
    [J]. NEPAL JOURNAL OF EPIDEMIOLOGY, 2021, 11 (04): : 1130 - 1131
  • [39] When college students' attempts at paraphrasing become instances of potential plagiarism
    Roig, M
    [J]. PSYCHOLOGICAL REPORTS, 1999, 84 (03) : 973 - 982
  • [40] Constructing an Academic Thai Plagiarism Corpus for Benchmarking Plagiarism Detection Systems
    Taerungruang, Supawat
    Aroonmanakun, Wirote
    [J]. GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2018, 18 (03): : 186 - 202