Methods for cross-language plagiarism detection

被引:48
|
作者
Barron-Cedeno, Alberto [1 ,2 ]
Gupta, Parth [3 ]
Rosso, Paolo [3 ]
机构
[1] Univ Politecn Cataluna, Talp Res Ctr, E-08028 Barcelona, Spain
[2] Univ Politecn Madrid, Fac Informat, E-28040 Madrid, Spain
[3] Univ Politecn Valencia, NLE Lab ELiRF, Valencia, Spain
关键词
Automatic plagiarism detection; Cross-language plagiarism; Plagiarism detection architecture; Cross-language similarity; Text re-use analysis; RETRIEVAL;
D O I
10.1016/j.knosys.2013.06.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available. In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T + MA); three inherently different models in nature and required resources. The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks something never done before. The experiments show that T + MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:211 / 217
页数:7
相关论文
共 50 条
  • [1] Cross-language plagiarism detection
    Potthast, Martin
    Barron-Cedeno, Alberto
    Stein, Benno
    Rosso, Paolo
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (01) : 45 - 62
  • [2] Cross-language plagiarism detection
    Martin Potthast
    Alberto Barrón-Cedeño
    Benno Stein
    Paolo Rosso
    [J]. Language Resources and Evaluation, 2011, 45 : 45 - 62
  • [3] Meta-Analysis of Cross-Language Plagiarism and Self-Plagiarism Detection Methods for Russian-English Language Pair
    Tlitova, Alina
    Toschev, Alexander
    Talanov, Max
    Kurnosov, Vitaliy
    [J]. FRONTIERS IN COMPUTER SCIENCE, 2020, 2
  • [4] Cross-Language Plagiarism Detection Model Based On Multiple Features
    Liu, Gang
    Dong, Yichao
    Li, Guangxi
    [J]. 26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
  • [5] On the Mono- and Cross-Language Detection of Text Reuse and Plagiarism
    Barron-Cedeno, Alberto
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 914 - 914
  • [6] Word Embedding for High Performance Cross-Language Plagiarism Detection Techniques
    Bouaine, Chaimaa
    Benabbou, Faouzia
    Sadgali, Imane
    [J]. International Journal of Interactive Mobile Technologies, 2023, 17 (10): : 69 - 91
  • [7] Cross-Language Plagiarism Detection Method: Arabic vs. English
    Hattab, Ezz
    [J]. PROCEEDINGS 2015 INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING DESE 2015, 2015, : 141 - 144
  • [8] Cross-language Plagiarism Detection Using BabelNet's Statistical Dictionary
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    [J]. COMPUTACION Y SISTEMAS, 2012, 16 (04): : 383 - 390
  • [9] A systematic study of knowledge graph analysis for cross-language plagiarism detection
    Franco-Salvador, Marc
    Rosso, Paolo
    Montes-y-Gomez, Manuel
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (04) : 550 - 570
  • [10] A New Approach for Cross-Language Plagiarism Analysis
    Pereira, Rafael Corezola
    Moreira, Viviane P.
    Galante, Renata
    [J]. MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS EVALUATION, 2010, 6360 : 15 - 26