Bash comment generation via data augmentation and semantic-aware CodeBERT

被引:0
|
作者
Yiheng Shen
Xiaolin Ju
Xiang Chen
Guang Yang
机构
[1] Nantong University,School of Information Science and Technology
[2] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
关键词
Bash code; Code comment generation; Adversarial training; Data augmentation;
D O I
暂无
中图分类号
学科分类号
摘要
Understanding Bash code is challenging for developers due to its syntax flexibility and unique features. Bash lacks sufficient training data compared to comment generation tasks in popular programming languages. Furthermore, collecting more real Bash code and corresponding comments is time-consuming and labor-intensive. In this study, we propose a two-module method named Bash2Com for Bash code comments generation. The first module, NP-GD, is a gradient-based automatic data augmentation component that enhances normalization stability when generating adversarial examples. The second module, MASA, leverages CodeBERT to learn the rich semantics of Bash code. Specifically, MASA considers the representations learned at each layer of CodeBERT as a set of semantic information that captures recursive relationships within the code. To generate comments for different Bash snippets, MASA employs LSTM and attention mechanisms to dynamically concentrate on relevant representational information. Then, we utilize the Transformer decoder and beam search algorithm to generate code comments. To evaluate the effectiveness of Bash2Com, we consider a corpus of 10,592 Bash code and corresponding comments. Compared with the state-of-the-art baselines, our experimental results show that Bash2Com can outperform all baselines by at least 10.19%, 11.81%, 2.61%, and 6.13% in terms of the performance measures BLEU-3/4, METEOR, and ROUGR-L. Moreover, the rationality of NP-GD and MASA in Bash2Com are verified by ablation studies. Finally, we conduct a human evaluation to illustrate the effectiveness of Bash2Com from practitioners’ perspectives.
引用
收藏
相关论文
共 50 条
  • [31] Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
    Liu, Xian
    Xu, Yinghao
    Wu, Qianyi
    Zhou, Hang
    Wu, Wayne
    Zhou, Bolei
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 106 - 125
  • [32] Semantic-aware conditional variational autoencoder for one-to-many dialogue generation
    Ye Wang
    Jingbo Liao
    Hong Yu
    Jiaxu Leng
    Neural Computing and Applications, 2022, 34 : 13683 - 13695
  • [33] Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems
    Hua, Yu
    Jiang, Hong
    Zhu, Yifeng
    Feng, Dan
    Tian, Lei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (02) : 337 - 344
  • [34] CLFuzz: Vulnerability Detection of Cryptographic Algorithm Implementation via Semantic-aware Fuzzing
    Zhou, Yuanhang
    Ma, Fuchen
    Chen, Yuanliang
    Ren, Meng
    Jiang, Yu
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (02)
  • [35] One-Stage Visual Grounding via Semantic-Aware Feature Filter
    Ye, Jiabo
    Lin, Xin
    He, Liang
    Li, Dingbang
    Chen, Qin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1702 - 1711
  • [36] Semantic-Aware Clustering-based Approach of Trajectory Data Stream Mining
    Tasnim, Samia
    Caldas, Juan
    Pissinou, Niki
    Iyengar, S. S.
    Ding, Ziqian
    2018 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2018, : 88 - 92
  • [37] Semantic-Aware Lossless Data Compression for Deep Learning Recommendation Model (DLRM)
    Pumma, Sarunya
    Vishnu, Abhinav
    PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), 2021, : 1 - 8
  • [38] Semantic-Aware Privacy-Preserving Online Location Trajectory Data Sharing
    Zheng, Zhirun
    Li, Zhetao
    Jiang, Hongbo
    Zhang, Leo Yu
    Tu, Dengbiao
    IEEE Transactions on Information Forensics and Security, 2022, 17 : 2256 - 2271
  • [39] Semantic-Aware Privacy-Preserving Online Location Trajectory Data Sharing
    Zheng, Zhirun
    Li, Zhetao
    Jiang, Hongbo
    Zhang, Leo Yu
    Tu, Dengbiao
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2022, 17 : 2256 - 2271
  • [40] Next Generation Assisting Clinical Applications by using Semantic-aware Electronic Health Records
    De Potter, Pieterjan
    Debevere, Pedro
    Mannens, Erik
    Van de Walle, Rik
    2009 22ND IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, 2009, : 19 - 23