A Ranking-Based Text Matching Approach for Plagiarism Detection

被引:3
|
作者
Kong, Leilei [1 ]
Han, Zhongyuan [1 ]
Qi, Haoliang [2 ]
Lu, Zhimao [3 ]
机构
[1] Heilongjiang Inst Technol, Harbin, Heilongjiang, Peoples R China
[2] State Key Lab Digital Publishing Technol China, Harbin, Heilongjiang, Peoples R China
[3] Dalian Univ Technol, Dalian, Peoples R China
基金
中国国家自然科学基金;
关键词
plagiarism detection; plagiarism text matching; high-obfuscation plagiarism; ranking; meteor; N-GRAMS;
D O I
10.1587/transfun.E101.A.799
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the issue of text matching for plagiarism detection. This task aims at identifying the matching plagiarism segments in a pair of suspicious document and its plagiarism source document. All the time, heuristic-based methods are mainly utilized to resolve this problem. But the heuristics rely on the experts' experiences and fail to integrate more features to detect the high obfuscation plagiarism matches. In this paper, a statistical machine learning approach, named the Ranking-based Text Matching Approach for Plagiarism Detection, is proposed to deal with the issues of high obfuscation plagiarism detection. The plagiarism text matching is formalized as a ranking problem, and a pairwise learning to rank algorithm is exploited to identify the most probable plagiarism matches for a given suspicious segment. Especially, the Meteor evaluation metrics of machine translation are subsumed by the proposed method to capture the lexical and semantic text similarity. The proposed method is evaluated on PAN12 and PAN13 text alignment corpus of plagiarism detection and compared to the methods achieved the best performance in PAN12, PAN13 and PAN14. Experimental results demonstrate that the proposed method achieves statistically significantly better performance than the baseline methods in all twelve document collections belonging to five different plagiarism categories. Especially at the PAN12 Artificial-high Obfuscation sub-corpus and PAN13 Summary Obfuscation plagiarism sub-corpus, the main evaluation metrics PlagDet of the proposed method are even 22% and 43% relative improvements than the baselines. Moreover, the efficiency of the proposed method is also better than that of baseline methods.
引用
收藏
页码:799 / 810
页数:12
相关论文
共 50 条
  • [1] Ranking-Based Recommendation System with Text Modeling
    Huang, Chuchu
    Chen, Guang
    BIG DATA TECHNOLOGY AND APPLICATIONS, 2016, 590 : 130 - 143
  • [2] Optimized Ranking-Based Community Detection
    Pirouz, Matin
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 1406 - 1409
  • [3] Turnitin: Is it a text matching or plagiarism detection tool?
    Meo, Sultan A.
    Talha, Muhammad
    SAUDI JOURNAL OF ANAESTHESIA, 2019, 13 : 48 - 51
  • [4] A Ranking-based approach for Hierarchical Classification
    Naik, Azad
    Rangwala, Huzefa
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1062 - 1071
  • [5] Ranking-based Method for News Stance Detection
    Zhang, Qiang
    Yilmaz, Emine
    Liang, Shangsong
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 41 - 42
  • [6] A Ranking Approach to Source Retrieval of Plagiarism Detection
    Kong, Leilei
    Lu, Zhimao
    Han, Zhongyuan
    Qi, Haoliang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (01): : 203 - 205
  • [7] A Ranking-based Cascade Approach for Unbalanced Data
    Bria, Alessandro
    Marrocco, Claudio
    Molinara, Mario
    Tortorella, Francesco
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3439 - 3442
  • [8] Manifold Ranking-Based Matrix Factorization for Saliency Detection
    Tao, Dapeng
    Cheng, Jun
    Song, Mingli
    Lin, Xu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (06) : 1122 - 1134
  • [9] A Novel Hierarchical Approach to Ranking-Based Collaborative Filtering
    Nikolakopoulos, Athanasios N.
    Kouneli, Marianna
    Garofalakis, John
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT II, 2013, 384 : 50 - 59
  • [10] Ranking-based Feature Selection for Anomaly Detection in Sensor Networks
    Li, Rui
    Zhao, Jizhong
    Liu, Kebin
    He, Yuan
    AD HOC & SENSOR WIRELESS NETWORKS, 2013, 19 (1-2) : 119 - 139