Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information

被引:24
|
作者
Ehsan, Nava [1 ]
Shakery, Azadeh [1 ,2 ]
机构
[1] Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
关键词
Candidate document retrieval; Cross-language plagiarism detection; Text segmentation; Proximity-based retrieval; TEXT;
D O I
10.1016/j.ipm.2016.04.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth of documents in different languages, the increased accessibility of electronic documents, and the availability of translation tools have caused cross-lingual plagiarism detection research area to receive increasing attention in recent years. The task of cross-language plagiarism detection entails two main steps: candidate retrieval and assessing pairwise document similarity. In this paper we examine candidate retrieval, where the goal is to find potential source documents of a suspicious text. Our proposed method for cross-language plagiarism detection is a keyword-focused approach. Since plagiarism usually happens in parts of the text, there is a requirement to segment the texts into fragments to detect local similarity. Therefore we propose a topic-based segmentation algorithm to convert the suspicious document to a set of related passages. After that, we use a proximity-based Model to retrieve documents with the best matching passages. Experiments show promising results for this important phase of cross-language plagiarism detection. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1004 / 1017
页数:14
相关论文
共 50 条
  • [1] Cross-Lingual Plagiarism Detection: Two Are Better Than One
    K. Avetisyan
    G. Gritsay
    A. Grabovoy
    [J]. Programming and Computer Software, 2023, 49 : 346 - 354
  • [2] Cross-Lingual Plagiarism Detection: Two Are Better Than One
    Avetisyan, K.
    Gritsay, G.
    Grabovoy, A.
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2023, 49 (04) : 346 - 354
  • [3] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    [J]. 23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +
  • [4] Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance
    Balikas, Georgios
    Laclau, Charlotte
    Redko, Ievgen
    Amini, Massih-Reza
    [J]. ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 398 - 410
  • [5] Cross-lingual information retrieval using hidden Markov models
    Xu, JX
    Weischedel, R
    [J]. PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 95 - 103
  • [6] Supporting Arabic Cross-Lingual Retrieval Using Contextual Information
    Ahmed, Farag
    Nuernberger, Andreas
    Nitsche, Marcus
    [J]. MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2011, 6653 : 30 - 45
  • [7] Cross-lingual information retrieval by feature vectors
    Lilleng, Jeanine
    Tomassen, Stein L.
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4592 : 229 - +
  • [8] Dictionary methods for cross-lingual information retrieval
    Ballesteros, L
    Croft, B
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, 1996, 1134 : 791 - 801
  • [9] A system for supporting cross-lingual information retrieval
    Capstick, J
    Diagne, AK
    Erbach, G
    Uszkoreit, H
    Leisenberg, A
    Leisenberg, M
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2000, 36 (02) : 275 - 289
  • [10] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    [J]. APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174