FMCF: A fusing multiple code features approach based on Transformer for Solidity smart contracts source code summarization

被引:0
|
作者
Lei, Gang [1 ]
Zhang, Donghua [2 ]
Xiao, Jianmao [1 ]
Fan, Guodong [3 ]
Cao, Yuanlong [1 ]
Feng, Zhiyong [3 ]
机构
[1] Jiangxi Normal Univ, Sch Software, Nanchang 330022, Peoples R China
[2] Jiangxi Normal Univ, Sch Digital Ind, Shangrao 334000, Peoples R China
[3] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China
关键词
Solidity smart contracts; Source code summarization; Feature fusion; Transformer; Structure-based traversal; Manual evaluation; NATURAL-LANGUAGE SUMMARIES;
D O I
10.1016/j.asoc.2024.112238
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A smart contract is a software program executed on a blockchain, designed to facilitate functionalities such as contract execution, asset administration, and identity validation within a secure and decentralized ecosystem. Summarizing the code of Solidity smart contracts aids developers in promptly grasping essential functionalities, thereby enhancing the security posture of Ethereum-based projects. Existing smart contract code summarization works mainly use traditional information retrieval and single code features, resulting in suboptimal performance. In this study, we propose a fusing multiple code features (FMCF) approach based on Transformer for Solidity summarization. First, FMCF created contract integrity modeling and state immutability modeling in the data preprocessing stage to process and filter data that meets security conditions. At the same time, FMCF retains the self-attention mechanism to construct the Graph Attention Network (GAT) encoder and CodeBERT encoder, which respectively extract multiple feature vectors of the code to ensure the integrity of the source code information. Furthermore, the FMCF uses a weighted summation method to input these two types of feature vectors into the feature fusion module for fusion and inputs the fused feature vectors into the Transformer decoder to obtain the final smart contract code summarization. The experimental results show that FMCF outperforms the standard baseline methods by 12.45% in the BLEU score and maximally preserves the semantic information and syntax structures of the source code. The results demonstrate that the FMCF can provide a good direction for future research on smart contract code summarization, thereby helping developers enhance the security of development projects.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Retrieval-based Neural Source Code Summarization
    Zhang, Jian
    Wang, Xu
    Zhang, Hongyu
    Sun, Hailong
    Liu, Xudong
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 1385 - 1397
  • [22] Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization
    Zhang, Chunyan
    Zhou, Qinglei
    Qiao, Meng
    Tang, Ke
    Xu, Lianqiu
    Liu, Fudong
    ENTROPY, 2022, 24 (10)
  • [23] M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization
    Gao, Yuexiu
    Lyu, Chen
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 24 - 35
  • [24] M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization
    Gao, Yuexiu
    Lyu, Chen
    IEEE International Conference on Program Comprehension, 2022, 2022-March : 24 - 35
  • [25] VulMPFF: A Vulnerability Detection Method for Fusing Code Features in Multiple Perspectives
    Cao, Xiansheng
    Wang, Junfeng
    Wu, Peng
    Fang, Zhiyang
    IET INFORMATION SECURITY, 2024, 2024 (01)
  • [26] Bi-LSTM-Based Neural Source Code Summarization
    Aljumah, Sarah
    Berriche, Lamia
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [27] Goal and Policy Based Code Generation and Deployment of Smart Contracts
    Tsiounis, Konstantinos
    Kontogiannis, Kostas
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 1227 - 1230
  • [28] DLBT: Deep Learning-Based Transformer to Generate Pseudo-Code from Source Code
    Gad, Walaa
    Alokla, Anas
    Nazih, Waleed
    Aref, Mustafa
    Salem, Abdel-badeeh
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3117 - 3132
  • [29] Survey on Neural Network-based Automatic Source Code Summarization Technologies
    Song X.-T.
    Sun H.-L.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (01): : 55 - 77
  • [30] Source Code Summarization Using Attention-based Keyword Memory Networks
    Choi, YunSeok
    Kim, Suah
    Lee, Jee-Hyong
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 564 - 570