Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

被引:0
|
作者
Kiyak E.O. [1 ]
Cengiz A.B. [1 ]
Birant K.U. [2 ]
Birant D. [2 ]
机构
[1] The Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir
[2] Department of Computer Engineering, Dokuz Eylul University, Izmir
关键词
Deep learning; Image classification; Programming languages; Software engineering; Source code classification; Text mining;
D O I
10.1007/s42979-020-00281-1
中图分类号
学科分类号
摘要
Source code classification (SCC) is a task to assign codes into different categories according to a criterion such as according to their functionalities, programming languages or vulnerabilities. Many source code archives are organized according to the programming languages, and thereby, the desired code fragments can be easily accessed by searching within the archive. However, manually organizing source code archives by field experts is labor intensive and impractical because of the fast-growing available source codes. Therefore, this study proposes new convolutional neural network (CNN) architectures to build source code classifiers that automatically identify programming languages from source codes. This is the first study in which the performances of deep learning algorithms on programming language identification are compared on both image and text files. In this study, the experiments are performed on three source code datasets to identify eight programming languages, including C, C++, C# , Go, Python, Ruby, Rust, and Java. The comparative results indicate that although text-based SCC and image-based SCC approaches achieve very high (> 93.5 %) and similar accuracies, text-based classification has significantly better performance in terms of execution time. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [1] Research on Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha
    Tang, Mengyun
    Gao, Haichang
    Zhang, Yang
    Liu, Yi
    Zhang, Ping
    Wang, Ping
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (10) : 2522 - 2537
  • [2] Using Image-based and Text-based Information for Sales Prediction: A Deep Neural Network Model Completed Research
    Wang, Ying
    Guo, Yue
    Song, Jaeki
    [J]. AMCIS 2018 PROCEEDINGS, 2018,
  • [3] Multimodal Deep Networks for Text and Image-Based Document Classification
    Audebert, Nicolas
    Herold, Catherine
    Slimani, Kuider
    Vidal, Cedric
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 1167 : 427 - 443
  • [4] A Novel Image-Based Malware Classification Model Using Deep Learning
    Jiang, Yongkang
    Li, Shenghong
    Wu, Yue
    Zou, Futai
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 150 - 161
  • [5] Image Sense Classification in Text-Based Image Retrieval
    Chang, Yih-Chen
    Chen, Hsin-Hsi
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 124 - 135
  • [6] Classification of wheat varieties with image-based deep learning
    Ceyhan, Merve
    Kartal, Yusuf
    Ozkan, Kemal
    Seke, Erol
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 9597 - 9619
  • [7] Classification of wheat varieties with image-based deep learning
    Merve Ceyhan
    Yusuf Kartal
    Kemal Özkan
    Erol Seke
    [J]. Multimedia Tools and Applications, 2024, 83 : 9597 - 9619
  • [8] Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison
    Razali, Ansari
    Daud, Salwani Mohd
    Zin, Nor Azan Mat
    Shahidi, Faezehsadat
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 570 - 576
  • [9] Text-Based Emotion Recognition Using Deep Learning Approach
    Bharti, Santosh Kumar
    Varadhaganapathy, S.
    Gupta, Rajeev Kumar
    Shukla, Prashant Kumar
    Bouye, Mohamed
    Hingaa, Simon Karanja
    Mahmoud, Amena
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [10] Text-Based Emotion Recognition Using Deep Learning Approach
    Bharti, Santosh Kumar
    Varadhaganapathy, S.
    Gupta, Rajeev Kumar
    Shukla, Prashant Kumar
    Bouye, Mohamed
    Hingaa, Simon Karanja
    Mahmoud, Amena
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022