Efficient clustering-based source code plagiarism detection using PIY

被引:0
|
作者
Tony Ohmann
Imad Rahal
机构
[1] University of Massachusetts,School of Computer Science
[2] Saint John’s University,Department of Computer Science, College of Saint Benedict
来源
关键词
Plagiarism detection; Data clustering; -Grams; Parallel computing; NUMA;
D O I
暂无
中图分类号
学科分类号
摘要
Vast amounts of information available online make plagiarism increasingly easy to commit, and this is particularly true of source code. The traditional approach of detecting copied work in a course setting is manual inspection. This is not only tedious but also typically misses code plagiarized from outside sources or even from an earlier offering of the course. Systems to automatically detect source code plagiarism exist but tend to focus on small submission sets. One such system that has become the standard in automated source code plagiarism detection is measure of software similarity (MOSS) Schleimer et al. in proceedings of the 2003 ACM SIGMOD international conference on management of data, ACM, San Diego, 2003. In this work, we present an approach called program it yourself (PIY) which is empirically shown to outperform MOSS in detection accuracy. By utilizing parallel processing and data clustering, PIY is also capable of maintaining detection accuracy and reasonable runtimes even when using extremely large data repositories.
引用
收藏
页码:445 / 472
页数:27
相关论文
共 50 条
  • [21] Source Code Plagiarism Detection and Performance Analysis Using Fingerprint Based Distance Measure Method
    Narayanan, Sandhya
    Simi, S.
    [J]. PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 1065 - 1068
  • [22] Improving Source Code Plagiarism Detection: Lessons Learned
    Misic, Marko J.
    Protic, Jelica Z.
    Tomasevic, Milo V.
    [J]. 2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 856 - 863
  • [23] CPDP: A Robust Technique for Plagiarism Detection in Source Code
    Muddu, Basavaraju
    Asadullah, Allahbaksh
    Bhat, Vasudev
    [J]. 2013 7TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2013, : 39 - 45
  • [24] A SOURCE CODE AND NON-SOURCE CODE PLAGIARISM DETECTION RESEARCH FOR C PROGRAM
    Zhong Mei
    Li Yanchen
    Liu Dongsheng
    [J]. 2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 3, 2012, : 543 - 547
  • [25] Review of source-code plagiarism detection in academia
    Novak, Matija
    [J]. 2016 39TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2016, : 796 - 801
  • [26] The Source Code Plagiarism Detection based on Function Sub-string Matching
    Xiao JingZhong
    Xiao Li
    [J]. 2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 1, 2011, : 397 - 400
  • [27] Design Patterns based Pre-processing of Source Code for Plagiarism Detection
    Asadullah, Allahbaksh
    Basavaraju, M.
    Stern, Ilan
    Bhat, Vasudev D.
    [J]. 2012 19TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW), VOL. 2, 2012, : 128 - 135
  • [28] Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream
    Elahi, Manzoor
    Li, Kun
    Nisar, Wasif
    Lv, Xinjie
    Wang, Hongan
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 298 - 304
  • [29] An Efficient Local Region and Clustering-Based Ensemble System for Intrusion Detection
    Huu Hoa Nguyen
    Harbi, Nouria
    Darmont, Jerome
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS '11), 2011, : 185 - 191
  • [30] ES-Plag: Efficient and sensitive source code plagiarism detection tool for academic environment
    Sulistiani, Lisan
    Karnalim, Oscar
    [J]. COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2019, 27 (01) : 166 - 182