PLAGIARISM DETECTION IN TEXT DOCUMENTS USING SENTENCE BOUNDED STOP WORD N-GRAMS

被引:0
|
作者
Gupta, Deepa [1 ]
Vani, K. [1 ]
Leema, L. M. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Bangalore Campus, Bangalore 560035, Karnataka, India
关键词
Plagiarism detection; Extrinsic plagiarism; Stop word; Sentence bounded; POS tagging;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the evolution of technologies like internet search engines and improved text editors, plagiarism has become a critical issue. Many works are already available in verbatim plagiarism detection which is a type of simple copy and paste plagiarism but when it comes to intelligent plagiarism the scenario becomes more complex. Intelligent plagiarism includes plagiarism through idea adoption, translation and text manipulations which is more challenging to deal with. The paper makes an attempt to detect intelligent plagiarism using the structural information within the document. This is done by the extraction of stop words, in contrast to the other methods that usually rely upon content words. The proposed method enhances this existing idea by including the rough sentence boundaries along with stop word profiles. Further this method is extended using the part of speech tags and finally the system is evaluated using sample documents from PAN-2010 data set. The results are compared with the baseline approach and performance is evaluated based on standard PAN measures.
引用
收藏
页码:1403 / 1420
页数:18
相关论文
共 50 条
  • [1] Plagiarism Detection Using Stopword n-grams
    Stamatatos, Efstathios
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (12): : 2512 - 2527
  • [2] UNORDERED N-GRAMS: NEW APPROACH IN TEXT PLAGIARISM DETECTION
    Pribil, Jiri
    Leseticky, Ondrej
    Kubalova, Kamila
    [J]. INFORMATION TECHNOLOGIES' 2009, 2009, : 243 - 249
  • [3] Sentence Classification Using N-Grams in Urdu Language Text
    Awan, Malik Daler Ali
    Ali, Sikandar
    Samad, Ali
    Iqbal, Nadeem
    Missen, Malik Muhammad Saad
    Ullah, Niamat
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [4] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    [J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
  • [5] Word Length n-Grams for Text Re-use Detection
    Barron-Cedeno, Alberto
    Basile, Chiara
    Degli Esposti, Mirko
    Rosso, Paolo
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 687 - +
  • [6] On Automatic Plagiarism Detection Based on n-Grams Comparison
    Barron-Cedeno, Alberto
    Rosso, Paolo
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 696 - 700
  • [7] SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS
    ISOTANI, R
    MATSUNAGA, S
    SAGAYAMA, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 692 - 697
  • [8] Using word semantic concepts for plagiarism detection in text documents
    Chang, Chia-Yang
    Lee, Shie-Jue
    Wu, Chih-Hung
    Liu, Chih-Feng
    Liu, Ching-Kuan
    [J]. INFORMATION RETRIEVAL JOURNAL, 2021, 24 (4-5): : 298 - 321
  • [9] Text coherence new method using word2vec sentence vectors and most likely n-grams
    Abdolahi, Mohamad
    Zahedi, Morteza
    [J]. 2017 3RD IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2017, : 105 - 109
  • [10] Using word semantic concepts for plagiarism detection in text documents
    Chia-Yang Chang
    Shie-Jue Lee
    Chih-Hung Wu
    Chih-Feng Liu
    Ching-Kuan Liu
    [J]. Information Retrieval Journal, 2021, 24 : 298 - 321