Code Authorship Attribution using content-based and non-content-based features

被引:1
|
作者
Bayrami, Parinaz [1 ]
Rice, Jacqueline E. [1 ]
机构
[1] Univ Lethbridge, Dept Math & Comp Sci, Lethbridge, AB, Canada
关键词
D O I
10.1109/CCECE53047.2021.9569061
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
To attribute authorship (author identification) means to identify the true author of a sample of work among many candidates. Author identification is an important research field in natural language. Machine learning approaches are widely used in natural language analysis, and previous research has shown that similar techniques can be applied in the analysis of computer programming (artificial) languages. This paper focuses on the use of machine learning techniques in the identification of authors of computer programs. We focus on identifying which features capture the writing style of authors in the classification of a computer program according to the author's identity. We then propose a novel approach for computer program author identification. In this method, features from source code of the programs are combined with authors' sociological features (gender and region) to develop the classification model. Several experiments were conducted on two datasets composed of computer programs written in C++. Our models are able to predict an author's identity with a 75% accuracy rate.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Android Authorship Attribution Using Source Code-Based Features
    Aydogan, Emre
    Sen, Sevil
    [J]. IEEE ACCESS, 2024, 12 : 6569 - 6589
  • [2] Content-based image retrieval using composite features
    Kauniskangas, H
    Sauvola, J
    Pietikainen, M
    Doermann, D
    [J]. SCIA '97 - PROCEEDINGS OF THE 10TH SCANDINAVIAN CONFERENCE ON IMAGE ANALYSIS, VOLS 1 AND 2, 1997, : 35 - 42
  • [3] Content-based image retrieval using multiple features
    Zhang, Chi
    Huang, Lei
    [J]. Journal of Computing and Information Technology, 2014, 22 (SpecialIssue) : 1 - 10
  • [4] Figure Plagiarism Detection Using Content-Based Features
    Eisa, Taiseer
    Salim, Naomie
    Alzahrani, Salha
    [J]. RECENT DEVELOPMENTS IN INTELLIGENT COMPUTING, COMMUNICATION AND DEVICES, ICCD 2016, 2017, 555 : 17 - 20
  • [5] Content-based image retrieval using texture features
    Honda, MO
    Azevedo-Marques, PM
    Rodrigues, JAH
    [J]. CARS 2002: COMPUTER ASSISTED RADIOLOGY AND SURGERY, PROCEEDINGS, 2002, : 1036 - 1036
  • [6] Features for Content-Based Audio Retrieval
    Mitrovic, Dalibor
    Zeppelzauer, Matthias
    Breiteneder, Christian
    [J]. ADVANCES IN COMPUTERS, VOL 78: IMPROVING THE WEB, 2010, 78 : 71 - 150
  • [7] Content-based image retrieval using colour and shape features
    Park, YoungJae
    Park, KeeHong
    Kim, GyeYoung
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2013, 48 (02) : 155 - 161
  • [8] The Content-based Image Retrieval Method Using Multiple Features
    Ha, Jeong-Yo
    Kim, Gye-Young
    Choi, Hyung-Il
    [J]. NCM 2008 : 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 652 - 657
  • [9] Author Profiles Prediction Using Syntactic and Content-Based Features
    Reddy, T. Raghunadha
    Srilatha, M.
    Sreenivas, M.
    Rajasekhar, N.
    [J]. DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 265 - 273
  • [10] Content-based image retrieval using perceptual shape features
    Wu, M
    Gao, QG
    [J]. IMAGE ANALYSIS AND RECOGNITION, 2005, 3656 : 567 - 574