Clustering source code from automated assessment of programming assignments

被引:0
|
作者
Paiva, Jose Carlos [1 ,2 ]
Leal, Jose Paulo [1 ,2 ]
Figueira, Alvaro [1 ,2 ]
机构
[1] INESC TEC, CRACS, Rua Campo Alegre, P-4169007 Porto, Portugal
[2] FCUP, DCC, Rua Campo Alegre, P-4169007 Porto, Portugal
关键词
Programming learning; Automated assessment; Programming assignments; Clustering; Semantic graph;
D O I
10.1007/s41060-024-00554-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering of source code is a technique that can help improve feedback in automated program assessment. Grouping code submissions that contain similar mistakes can, for instance, facilitate the identification of students' difficulties to provide targeted feedback. Moreover, solutions with similar functionality but possibly different coding styles or progress levels can allow personalized feedback to students stuck at some point based on a more developed source code or even detect potential cases of plagiarism. However, existing clustering approaches for source code are mostly inadequate for automated feedback generation or assessment systems in programming education. They either give too much emphasis to syntactical program features, rely on expensive computations over pairs of programs, or require previously collected data. This paper introduces an online approach and implemented tool-AsanasCluster-to cluster source code submissions to programming assignments. The proposed approach relies on program attributes extracted from semantic graph representations of source code, including control and data flow features. The obtained feature vector values are fed into an incremental k-means model. Such a model aims to determine the closest cluster of solutions, as they enter the system, timely, considering clustering is an intermediate step for feedback generation in automated assessment. We have conducted a twofold evaluation of the tool to assess (1) its runtime performance and (2) its precision in separating different algorithmic strategies. To this end, we have applied our clustering approach on a public dataset of real submissions from undergraduate students to programming assignments, measuring the runtimes for the distinct tasks involved: building a model, identifying the closest cluster to a new observation, and recalculating partitions. As for the precision, we partition two groups of programs collected from GitHub. One group contains implementations of two searching algorithms, while the other has implementations of several sorting algorithms. AsanasCluster matches and, in some cases, improves the state-of-the-art clustering tools in terms of runtime performance and precision in identifying different algorithmic strategies. It does so without requiring the execution of the code. Moreover, it is able to start the clustering process from a dataset with only two submissions and continuously partition the observations as they enter the system.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Auto Clustering Source Code To Detect Plagiarism Of Student Programming Assignments in Java']Java Programming Language
    Amaliah, Yusni
    Musu, Wilem
    Suprianto
    Fadlan, Muhammad
    [J]. 3RD INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (ICORIS 2021), 2021, : 695 - +
  • [2] Automated Clustering and Program Repair for Introductory Programming Assignments
    Gulwani, Sumit
    Radicek, Ivan
    Zuleger, Florian
    [J]. PROCEEDINGS OF THE 39TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, PLDI 2018, 2018, : 465 - 480
  • [3] Automated Clustering and Program Repair for Introductory Programming Assignments
    Gulwani, Sumit
    Radicek, Ivan
    Zuleger, Florian
    [J]. ACM SIGPLAN NOTICES, 2018, 53 (04) : 465 - 480
  • [4] A Survey of Automated Assessment Approaches for Programming Assignments
    Ala-Mutka, Kirsti M.
    [J]. COMPUTER SCIENCE EDUCATION, 2005, 15 (02) : 83 - 102
  • [5] Automated Process for Assessment of Learners Programming Assignments
    Choudhury, Preetam Roy
    Wats, Naman
    Jaiswal, Rahul
    Goudar, R. H.
    [J]. 2014 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2014, : 281 - 285
  • [6] Automated Assessment of Learning Objectives in Programming Assignments
    Rump, Arthur
    Fehnker, Ansgar
    Mader, Angelika
    [J]. INTELLIGENT TUTORING SYSTEMS (ITS 2021), 2021, 12677 : 299 - 309
  • [7] A Gamified Approach to Automated Assessment of Programming Assignments
    Polito, Giuseppina
    Temperini, Marco
    [J]. CHALLENGES AND SOLUTIONS IN SMART LEARNING, 2018, : 3 - 12
  • [8] Source Code based Approaches to Automate Marking in Programming Assignments
    Kuruppu, Thilmi
    Tharmaseelan, Janani
    Silva, Chamari
    Arachchillage, Udara Srimath S. Samaratunge
    Manathunga, Kalpani
    Reyal, Shyam
    Kodagoda, Nuwan
    Jayalath, Thilini
    [J]. CSEDU: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED EDUCATION - VOL 1, 2021, : 291 - 298
  • [9] Identifying plagiarised programming assignments based on source code similarity scores
    Cheers, Hayden
    Lin, Yuqing
    [J]. COMPUTER SCIENCE EDUCATION, 2023, 33 (04) : 621 - 645
  • [10] PROGpedia: Collection of source-code submitted to introductory programming assignments
    Paiva, Jose Carlos
    Leal, Jose Paulo
    Figueira, Alvaro
    [J]. DATA IN BRIEF, 2023, 46