Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

被引:0
|
作者
Barron-Cedeno, Alberto [1 ]
Rosso, Paolo [1 ]
Benedi, Jose-Miguel [1 ]
机构
[1] Univ Politecn Valencia, Dept Informat Syst & Computat, Camino Vera S-N, Valencia 46022, Spain
关键词
TEXTS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Automatic plagiarism detection considering a reference corpus compares it suspicions text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a short time. However, this is not always true. Reference corpora are often composed of a big set of original documents where a simple exhaustive search strategy becomes practically impossible. Before carrying out an exhaustive search, it is necessary to reduce the search space, represented by the documents in the reference corpus, as much as possible. Our experiments with the METER corpus show that. a previous search space reduction stage, based oil the Kullback-Leibler symmetric distance, reduces the search process time dramatically Additionally, it improves the Precision and Recall obtained by a search strategy based on the exhaustive comparison of word n-grams.
引用
收藏
页码:523 / +
页数:3
相关论文
共 50 条
  • [1] THE KULLBACK-LEIBLER DISTANCE
    KULLBACK, S
    [J]. AMERICAN STATISTICIAN, 1987, 41 (04): : 340 - 340
  • [2] USING KULLBACK-LEIBLER DISTANCE FOR PERFORMANCE EVALUATION OF SEARCH DESIGNS
    Talebi, H.
    Esmailzadeh, N.
    [J]. BULLETIN OF THE IRANIAN MATHEMATICAL SOCIETY, 2011, 37 (04) : 269 - 279
  • [3] The centroid of the symmetrical Kullback-Leibler distance
    Veldhuis, R
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (03) : 96 - 99
  • [4] Multispectral change detection using multivariate Kullback-Leibler distance
    Jabari, Shabnam
    Rezaee, Mohammad
    Fathollahi, Fatemeh
    Zhang, Yun
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 147 : 163 - 177
  • [5] Consistent estimator for basis selection based on a proxy of the Kullback-Leibler distance
    Dias, Ronaldo
    Garcia, Nancy L.
    [J]. JOURNAL OF ECONOMETRICS, 2007, 141 (01) : 167 - 178
  • [6] Correcting the Kullback-Leibler distance for feature selection
    Coetzee, FM
    [J]. PATTERN RECOGNITION LETTERS, 2005, 26 (11) : 1675 - 1683
  • [7] Kullback-Leibler Distance in Linear Parametric Modeling
    Beheshti, Soosan
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 1671 - 1675
  • [8] MODEL AVERAGING BASED ON KULLBACK-LEIBLER DISTANCE
    Zhang, Xinyu
    Zou, Guohua
    Carroll, Raymond J.
    [J]. STATISTICA SINICA, 2015, 25 (04) : 1583 - 1598
  • [9] Using Kullback-Leibler distance for text categorization
    Bigi, B
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 305 - 319
  • [10] Kullback-Leibler distance criterion consolidation in cloud
    Rahmani, Somayeh
    Khajehvand, Vahid
    Torabian, Mohsen
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 170 (170)