Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

被引:64
|
作者
Barron-Cedeno, Alberto [1 ]
Vila, Marta [2 ]
Antonia Marti, M. [2 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, ES-08034 Barcelona, Spain
[2] Univ Barcelona, Dept Linguist, CLiC, E-08007 Barcelona, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Dept Informat Syst & Computat, Valencia 46022, Spain
关键词
D O I
10.1162/COLI_a_00153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation.The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.
引用
收藏
页码:917 / 948
页数:32
相关论文
共 50 条
  • [41] AUTOMATIC PLAGIARISM DETECTION FOR SPOKEN RESPONSES IN AN ASSESSMENT OF ENGLISH LANGUAGE PROFICIENCY
    Wang, Xinhao
    Evanini, Keelan
    Bruno, James
    Mulholland, Matthew
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 121 - 128
  • [42] Automatic External Persian Plagiarism Detection Using Vector Space Model
    Mahdavi, Peyman
    Siadati, Zahra
    Yaghmaee, Farzin
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 697 - 702
  • [43] Plagiarism Prevention and Detection A Challenge
    Broussard, Lisa
    Hurst, Helen
    [J]. NURSE EDUCATOR, 2015, 40 (04) : 168 - 168
  • [44] Experimenting with plagiarism detection on the arXiv
    Feder, Toni
    [J]. PHYSICS TODAY, 2007, 60 (03) : 30 - 31
  • [45] Author Profiling and Plagiarism Detection
    Rosso, Paolo
    [J]. INFORMATION RETRIEVAL, RUSSIR 2014, 2015, 505 : 229 - 250
  • [46] Plagiarism Detection by Identifying the Keywords
    Dutta, Sandipan
    Bhattacharjee, Debotosh
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 703 - 707
  • [47] COMPUTER ALGORITHMS FOR PLAGIARISM DETECTION
    PARKER, A
    HAMBLEN, JO
    [J]. IEEE TRANSACTIONS ON EDUCATION, 1989, 32 (02) : 94 - 99
  • [48] Fuzzy Semantic Plagiarism Detection
    Osman, Ahmed Hamza
    Salim, Naomie
    Kumar, Yogan Jaya
    Abuobieda, Albaraa
    [J]. ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS, 2012, 322 : 543 - 553
  • [49] Plagiarism Detection Tool "Parikshak"
    Sharma, Shalini
    Sharma, Chandra Shekhar
    Tyagi, Veena
    [J]. 2015 International Conference on Communication, Information & Computing Technology (ICCICT), 2015,
  • [50] Language based plagiarism detection
    Kaniski, Matija
    [J]. CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2016), 2016, : 207 - 212