Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning

被引：0

作者：

Kiyak E.O. ^{[1
]}

Cengiz A.B. ^{[1
]}

Birant K.U. ^{[2
]}

Birant D. ^{[2
]}

机构：

[1] The Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Izmir

[2] Department of Computer Engineering, Dokuz Eylul University, Izmir

来源：

SN Computer Science | 2020年 / 1卷 / 5期

关键词：

Deep learning; Image classification; Programming languages; Software engineering; Source code classification; Text mining;

D O I：

10.1007/s42979-020-00281-1

中图分类号：

学科分类号：

摘要：

Source code classification (SCC) is a task to assign codes into different categories according to a criterion such as according to their functionalities, programming languages or vulnerabilities. Many source code archives are organized according to the programming languages, and thereby, the desired code fragments can be easily accessed by searching within the archive. However, manually organizing source code archives by field experts is labor intensive and impractical because of the fast-growing available source codes. Therefore, this study proposes new convolutional neural network (CNN) architectures to build source code classifiers that automatically identify programming languages from source codes. This is the first study in which the performances of deep learning algorithms on programming language identification are compared on both image and text files. In this study, the experiments are performed on three source code datasets to identify eight programming languages, including C, C++, C# , Go, Python, Ruby, Rust, and Java. The comparative results indicate that although text-based SCC and image-based SCC approaches achieve very high (> 93.5 %) and similar accuracies, text-based classification has significantly better performance in terms of execution time. © 2020, Springer Nature Singapore Pte Ltd.

引用

共 50 条

[41] Image-based phenotyping of disaggregated cells using deep learning
Berryman, Samuel
Matthews, Kerryn
Lee, Jeong Hyun
Duffy, Simon P.
Ma, Hongshen
[J]. COMMUNICATIONS BIOLOGY, 2020, 3 (01)
[42] Image-based phenotyping of disaggregated cells using deep learning
Samuel Berryman
Kerryn Matthews
Jeong Hyun Lee
Simon P. Duffy
Hongshen Ma
[J]. Communications Biology, 3
[43] Image-Based Monitoring of Jellyfish Using Deep Learning Architecture
Kim, Hanguen
Koo, Jungmo
Kim, Donghoon
Jung, Sungwook
Shin, Jae-Uk
Lee, Serin
Myung, Hyun
[J]. IEEE SENSORS JOURNAL, 2016, 16 (08) : 2215 - 2216
[44] Prediction of sloshing pressure using image-based deep learning
Kim, Ki Jong
Kim, Daegyoum
[J]. OCEAN ENGINEERING, 2024, 303
[45] Image-based Plant Diseases Detection using Deep Learning
Panchal, Adesh V.
Patel, Subhash Chandra
Bagyalakshmi, K.
Kumar, Pankaj
Khan, Ihtiram Raza
Soni, Mukesh
[J]. Materials Today: Proceedings, 2023, 80 : 3500 - 3506
[46] Image-based process monitoring using deep learning framework
Lyu, Yuting
Chen, Junghui
Song, Zhihuan
[J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2019, 189 : 8 - 17
[47] A Survey of Image-Based Indoor Localization using Deep Learning
Bai, Xiaolan
Huang, May
Prasad, Neeli Rashmi
Mihovska, Albena Dimitrova
[J]. 2019 22ND INTERNATIONAL SYMPOSIUM ON WIRELESS PERSONAL MULTIMEDIA COMMUNICATIONS (WPMC), 2019,
[48] Cartographic image watermarking using text-based normalization
Barni, M
Bartolini, F
Piva, A
Salucco, F
[J]. 2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 231 - 236
[49] Snore Sound Classification Using Image-based Deep Spectrum Features
Amiriparian, Shahin
Gerczuk, Maurice
Ottl, Sandra
Cummins, Nicholas
Freitag, Michael
Pugachevskiy, Sergey
Baird, Alice
Schuller, Bjoern
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3512 - 3516
[50] A Text-Based Deep Reinforcement Learning Framework for Interactive Recommendation
Wang, Chaoyang
Guo, Zhiqiang
Li, Jianjun
Pan, Peng
Li, Guohui
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 537 - 544

← 1 2 3 4 5 →