Bash comment generation via data augmentation and semantic-aware CodeBERT

被引:0
|
作者
Yiheng Shen
Xiaolin Ju
Xiang Chen
Guang Yang
机构
[1] Nantong University,School of Information Science and Technology
[2] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
关键词
Bash code; Code comment generation; Adversarial training; Data augmentation;
D O I
暂无
中图分类号
学科分类号
摘要
Understanding Bash code is challenging for developers due to its syntax flexibility and unique features. Bash lacks sufficient training data compared to comment generation tasks in popular programming languages. Furthermore, collecting more real Bash code and corresponding comments is time-consuming and labor-intensive. In this study, we propose a two-module method named Bash2Com for Bash code comments generation. The first module, NP-GD, is a gradient-based automatic data augmentation component that enhances normalization stability when generating adversarial examples. The second module, MASA, leverages CodeBERT to learn the rich semantics of Bash code. Specifically, MASA considers the representations learned at each layer of CodeBERT as a set of semantic information that captures recursive relationships within the code. To generate comments for different Bash snippets, MASA employs LSTM and attention mechanisms to dynamically concentrate on relevant representational information. Then, we utilize the Transformer decoder and beam search algorithm to generate code comments. To evaluate the effectiveness of Bash2Com, we consider a corpus of 10,592 Bash code and corresponding comments. Compared with the state-of-the-art baselines, our experimental results show that Bash2Com can outperform all baselines by at least 10.19%, 11.81%, 2.61%, and 6.13% in terms of the performance measures BLEU-3/4, METEOR, and ROUGR-L. Moreover, the rationality of NP-GD and MASA in Bash2Com are verified by ablation studies. Finally, we conduct a human evaluation to illustrate the effectiveness of Bash2Com from practitioners’ perspectives.
引用
收藏
相关论文
共 50 条
  • [1] Bash comment generation via data augmentation and semantic-aware CodeBERT
    Shen, Yiheng
    Ju, Xiaolin
    Chen, Xiang
    Yang, Guang
    AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (01)
  • [2] Semantic-Aware Data Augmentation for Text-to-Image Synthesis
    Tan, Zhaorui
    Yang, Xi
    Huang, Kaizhu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5098 - 5107
  • [4] WASMaker: Differential Testing of WebAssembly Runtimes via Semantic-Aware Binary Generation
    Cao, Shangtong
    He, Ningyu
    She, Xinyu
    Zhang, Yixuan
    Zhang, Mu
    Wang, Haoyu
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1262 - 1273
  • [5] BASHEXPLAINER: Retrieval-Augmented Bash Code Comment Generation based on Fine-tuned CodeBERT
    Yu, Chi
    Yang, Guang
    Chen, Xiang
    Liu, Ke
    Zhou, Yanlin
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 82 - 93
  • [6] Semantic-Aware Fingerprints of Symbolic Research Data
    Graebe, Hans-Gert
    MATHEMATICAL SOFTWARE, ICMS 2016, 2016, 9725 : 411 - 418
  • [7] A semantic-aware data generator for ETL workflows
    Du, Naiqiao
    Ye, Xiaojun
    Wang, Jianmin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (04): : 1016 - 1040
  • [8] A semantic-aware log generation method for network activities
    Yichiet, Aun
    Khaw, Yen-Min Jasmina
    Gan, Ming-Lee
    Ponnusamy, Vasaki
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2022, 21 (02) : 161 - 177
  • [9] A semantic-aware log generation method for network activities
    Aun Yichiet
    Yen-Min Jasmina Khaw
    Ming-Lee Gan
    Vasaki Ponnusamy
    International Journal of Information Security, 2022, 21 : 161 - 177
  • [10] Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation
    Wang, Zhe
    Guan, Jiaoyan
    Yang, Mengping
    Xiao, Ting
    Chi, Ziqiu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5079 - 5088