Transformer-based networks over tree structures for code classification

被引:0
|
作者
Hua, Wei [1 ]
Liu, Guangzhong [1 ]
机构
[1] College of Information Engineering, ShangHai Maritime University, Shanghai,201306, China
来源
Applied Intelligence | 2022年 / 52卷 / 08期
基金
中国国家自然科学基金;
关键词
Natural language processing systems - Syntactics - Codes (symbols) - Computer software - Convolutional neural networks - C (programming language) - Recurrent neural networks - Semantics - Cloning - Network coding;
D O I
暂无
中图分类号
学科分类号
摘要
In software engineering (SE), code classification and related tasks, such as code clone detection are still challenging problems. Due to the elusive syntax and complicated semantics in software programs, existing traditional SE approaches still have difficulty differentiating between the functionalities of code snippets at the semantic level with high accuracy. As artificial intelligence (AI) techniques have increased in recent years, exploring different machine/deep learning techniques for code classification algorithms has become important. However, most existing machine/deep learning-based approaches often consider using convolutional neural networks (CNNs) or recurrent neural networks (RNNs) to process code texts. However, the two networks inevitably suffer from gradient vanishing problems and fail to capture long-distance dependencies from code statements, resulting in poor performance in downstream tasks. In this paper, we propose the TBCC (Transformer-Based Code Classifier), a novel transformer-based neural network for programming language processing, which can avoid these two problems. Moreover, to capture the important syntactical features from programming languages, we split the deep abstract syntax trees (ASTs) into smaller subtrees that, aim to exploit syntactical information in code statements. We have applied the TBCC to two different common program comprehension tasks to verify its effectiveness: a code classification task for C programs and a code clone detection task for Java programs. The experimental results show that the TBCC achieves state-of-the-art performance, outperforming the baseline methods in terms of accuracy, recall, and F1 score. For subsequent research, the code of TBCC has been released ∗. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:8895 / 8909
相关论文
共 50 条
  • [1] Transformer-based networks over tree structures for code classification
    Hua, Wei
    Liu, Guangzhong
    [J]. APPLIED INTELLIGENCE, 2022, 52 (08) : 8895 - 8909
  • [2] Transformer-based networks over tree structures for code classification
    Wei Hua
    Guangzhong Liu
    [J]. Applied Intelligence, 2022, 52 : 8895 - 8909
  • [3] Transformer-Based Spiking Neural Networks for Multimodal Audiovisual Classification
    Guo, Lingyue
    Gao, Zeyu
    Qu, Jinye
    Zheng, Suiwu
    Jiang, Runhao
    Lu, Yanfeng
    Qiao, Hong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (03) : 1077 - 1086
  • [4] Transformer-based Bug/Feature Classification
    Ozturk, Ceyhun E.
    Yilmaz, Eyup Halit
    Koksal, Omer
    [J]. 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [5] EEG Classification with Transformer-Based Models
    Sun, Jiayao
    Xie, Jin
    Zhou, Huihui
    [J]. 2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 92 - 93
  • [6] SeTransformer: A Transformer-Based Code Semantic Parser for Code Comment Generation
    Li, Zheng
    Wu, Yonghao
    Peng, Bin
    Chen, Xiang
    Sun, Zeyu
    Liu, Yong
    Paul, Doyle
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (01) : 258 - 273
  • [7] An Empirical Study of Code Smells in Transformer-based Code Generation Techniques
    Siddiq, Mohammed Latif
    Majumder, Shafayat H.
    Mim, Maisha R.
    Jajodia, Sourov
    Santos, Joanna C. S.
    [J]. 2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 71 - 82
  • [8] Transformer-based Hierarchical Encoder for Document Classification
    Sakhrani, Harsh
    Parekh, Saloni
    Ratadiya, Pratik
    [J]. 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 852 - 858
  • [9] Practical Transformer-based Multilingual Text Classification
    Wang, Cindy
    Banko, Michele
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
  • [10] BertSRC: transformer-based semantic relation classification
    Lee, Yeawon
    Son, Jinseok
    Song, Min
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)