Efficient clustering-based source code plagiarism detection using PIY

被引:0
|
作者
Tony Ohmann
Imad Rahal
机构
[1] University of Massachusetts,School of Computer Science
[2] Saint John’s University,Department of Computer Science, College of Saint Benedict
来源
关键词
Plagiarism detection; Data clustering; -Grams; Parallel computing; NUMA;
D O I
暂无
中图分类号
学科分类号
摘要
Vast amounts of information available online make plagiarism increasingly easy to commit, and this is particularly true of source code. The traditional approach of detecting copied work in a course setting is manual inspection. This is not only tedious but also typically misses code plagiarized from outside sources or even from an earlier offering of the course. Systems to automatically detect source code plagiarism exist but tend to focus on small submission sets. One such system that has become the standard in automated source code plagiarism detection is measure of software similarity (MOSS) Schleimer et al. in proceedings of the 2003 ACM SIGMOD international conference on management of data, ACM, San Diego, 2003. In this work, we present an approach called program it yourself (PIY) which is empirically shown to outperform MOSS in detection accuracy. By utilizing parallel processing and data clustering, PIY is also capable of maintaining detection accuracy and reasonable runtimes even when using extremely large data repositories.
引用
收藏
页码:445 / 472
页数:27
相关论文
共 50 条
  • [1] Efficient clustering-based source code plagiarism detection using PIY
    Ohmann, Tony
    Rahal, Imad
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (02) : 445 - 472
  • [2] Scalable Source Code Plagiarism Detection Using Source Code Vectors Clustering
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 499 - 502
  • [3] USING CONCEPTS OF TEXT BASED PLAGIARISM DETECTION IN SOURCE CODE PLAGIARISM ANALYSIS
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    [J]. PLAGIARISM ACROSS EUROPE AND BEYOND 2017, 2017, : 177 - 186
  • [4] Using graph databases in source code plagiarism detection
    Novak, Matija
    Levak, Iva
    [J]. CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS, CECIIS 2022, 2022, : 465 - 470
  • [5] SOURCE CODE PLAGIARISM DETECTION METHOD USING ONTOLOGIES
    Smeureanu, Ion
    Iancu, Bogdan
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2013, : 594 - 597
  • [6] Automatic Source Code Plagiarism Detection
    Kustanto, Cynthia
    Liem, Inggriani
    [J]. SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 481 - 486
  • [7] Source Code Representations for Plagiarism Detection
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    [J]. LEARNING TECHNOLOGY FOR EDUCATION CHALLENGES, LTEC 2018, 2018, 870 : 61 - 69
  • [8] Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection
    Karnalim, Oscar
    Sulistiani, Lisan
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2018, : 23 - 28
  • [9] Software Source Code Plagiarism and Direction Detection Based on PDG
    Shu, Bo
    Du, Xiaojun
    [J]. MECHATRONICS, ROBOTICS AND AUTOMATION, PTS 1-3, 2013, 373-375 : 1172 - 1177
  • [10] Efficient Source Code Plagiarism Identification Based on Greedy String Tilling
    Haider, Khurram Zeeshan
    Nawaz, Tabassam
    Din, Sami ud
    Javed, Ali
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (12): : 204 - 210