Code Authorship Attribution using content-based and non-content-based features

被引:1
|
作者
Bayrami, Parinaz [1 ]
Rice, Jacqueline E. [1 ]
机构
[1] Univ Lethbridge, Dept Math & Comp Sci, Lethbridge, AB, Canada
关键词
D O I
10.1109/CCECE53047.2021.9569061
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
To attribute authorship (author identification) means to identify the true author of a sample of work among many candidates. Author identification is an important research field in natural language. Machine learning approaches are widely used in natural language analysis, and previous research has shown that similar techniques can be applied in the analysis of computer programming (artificial) languages. This paper focuses on the use of machine learning techniques in the identification of authors of computer programs. We focus on identifying which features capture the writing style of authors in the classification of a computer program according to the author's identity. We then propose a novel approach for computer program author identification. In this method, features from source code of the programs are combined with authors' sociological features (gender and region) to develop the classification model. Several experiments were conducted on two datasets composed of computer programs written in C++. Our models are able to predict an author's identity with a 75% accuracy rate.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Content-based image retrieval system using ORB and SIFT features
    Chhabra, Payal
    Garg, Naresh Kumar
    Kumar, Munish
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2725 - 2733
  • [32] Content-based image retrieval using a fusion of global and local features
    Bu, Hee Hyung
    Kim, Nam Chul
    Kim, Sung Ho
    [J]. ETRI JOURNAL, 2023, 45 (03) : 505 - 518
  • [33] Content-based image retrieval system using ORB and SIFT features
    Payal Chhabra
    Naresh Kumar Garg
    Munish Kumar
    [J]. Neural Computing and Applications, 2020, 32 : 2725 - 2733
  • [34] Content-based mobile spam classification using stylistically motivated features
    Sohn, Dae-Neung
    Lee, Jung-Tae
    Han, Kyoung-Soo
    Rim, Hae-Chang
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (03) : 364 - 369
  • [35] Using Content-Based Features for Author Profiling of Vietnamese Forum Posts
    Duc Tran Duong
    Son Bao Pham
    Hanh Tan
    [J]. RECENT DEVELOPMENTS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2016, 642 : 287 - 296
  • [36] Content-Based Image Retrieval Using Multiresolution Color and Texture Features
    Chun, Young Deok
    Kim, Nam Chul
    Jang, Ick Hoon
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (06) : 1073 - 1084
  • [37] Content-Based Image Retrieval Using a Combination of Texture and Color Features
    Bu, Hee-Hyung
    Kim, Nam-Chul
    Kim, Sung-Ho
    [J]. HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2021, 11
  • [38] Content-based image retrieval using Gabor-Zernike features
    Fu, X.
    Li, Y.
    Harrison, R.
    Belkasim, S.
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 417 - +
  • [39] Evaluation of a content-based image retrieval system using features based on colour means
    Khokher, Amandeep
    Talwar, Rajneesh
    [J]. International Journal of Information and Communication Technology, 2012, 4 (01) : 61 - 75
  • [40] "False Feigners": Examining the Impact of Non-Content-Based Invalid Responding on the Minnesota Multiphasic Personality Inventory-2 Restructured Form Content-Based Invalid Responding Indicators
    Burchett, Danielle
    Dragon, Wendy R.
    Holbert, Ashley M. Smith
    Tarescavage, Anthony M.
    Mattson, Curtis A.
    Handel, Richard W.
    Ben-Porath, Yossef S.
    [J]. PSYCHOLOGICAL ASSESSMENT, 2016, 28 (05) : 458 - 470