Bash comment generation via data augmentation and semantic-aware CodeBERT

被引:0
|
作者
Yiheng Shen
Xiaolin Ju
Xiang Chen
Guang Yang
机构
[1] Nantong University,School of Information Science and Technology
[2] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
关键词
Bash code; Code comment generation; Adversarial training; Data augmentation;
D O I
暂无
中图分类号
学科分类号
摘要
Understanding Bash code is challenging for developers due to its syntax flexibility and unique features. Bash lacks sufficient training data compared to comment generation tasks in popular programming languages. Furthermore, collecting more real Bash code and corresponding comments is time-consuming and labor-intensive. In this study, we propose a two-module method named Bash2Com for Bash code comments generation. The first module, NP-GD, is a gradient-based automatic data augmentation component that enhances normalization stability when generating adversarial examples. The second module, MASA, leverages CodeBERT to learn the rich semantics of Bash code. Specifically, MASA considers the representations learned at each layer of CodeBERT as a set of semantic information that captures recursive relationships within the code. To generate comments for different Bash snippets, MASA employs LSTM and attention mechanisms to dynamically concentrate on relevant representational information. Then, we utilize the Transformer decoder and beam search algorithm to generate code comments. To evaluate the effectiveness of Bash2Com, we consider a corpus of 10,592 Bash code and corresponding comments. Compared with the state-of-the-art baselines, our experimental results show that Bash2Com can outperform all baselines by at least 10.19%, 11.81%, 2.61%, and 6.13% in terms of the performance measures BLEU-3/4, METEOR, and ROUGR-L. Moreover, the rationality of NP-GD and MASA in Bash2Com are verified by ablation studies. Finally, we conduct a human evaluation to illustrate the effectiveness of Bash2Com from practitioners’ perspectives.
引用
收藏
相关论文
共 50 条
  • [21] ANTELOPE: A Semantic-Aware Data Cube Scheme for Cloud Data Center Networks
    Hua, Yu
    Liu, Xue
    Jiang, Hong
    IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (09) : 2146 - 2159
  • [22] Robust object tracking via ensembling semantic-aware network and redetection
    Liu, Peiqiang
    Liang, Qifeng
    An, Zhiyong
    Fu, Jingyi
    Mao, Yanyan
    IET COMPUTER VISION, 2024, 18 (01) : 46 - 59
  • [23] Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning
    Mei, Haiyang
    Yu, Letian
    Xu, Ke
    Wang, Yang
    Yang, Xin
    Wei, Xiaopeng
    Lau, Rynson W. H.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [24] Semantic-aware contrastive learning via multi-prompt alignment
    Zhao, Zhuoran
    Qin, Hao
    Kong, Ming
    Chen, Luyuan
    Xie, Di
    Zhu, Jiang
    Zhu, Qiang
    MACHINE LEARNING, 2025, 114 (03)
  • [25] Semantic-Aware Implicit Template Learning via Part Deformation Consistency
    Kim, Sihyeon
    Joo, Minseok
    Lee, Jaewon
    Ko, Juyeon
    Cha, Juhan
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 593 - 603
  • [26] Efficient Semantic-Aware Coflow Scheduling for Data-Parallel Jobs
    Li, Ziyang
    Zhang, Yiming
    Zhao, Yunxiang
    Li, Dongsheng
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 154 - 155
  • [27] Semantic-aware Workflow Construction and Analysis for Distributed Data Analytics Systems
    Pi, Aidi
    Chen, Wei
    Wang, Shaoqi
    Zhou, Xiaobo
    HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 255 - 266
  • [28] Secure Semantic-Aware Search Over Dynamic Spatial Data in VANETs
    Li, Jiayi
    Ma, Jianfeng
    Miao, Yinbin
    Yang, Fan
    Liu, Ximeng
    Choo, Kim-Kwang Raymond
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (09) : 8912 - 8925
  • [29] Training Deep Code Comment Generation Models via Data Augmentation
    Zhang, Xiaoqing
    Zhou, Yu
    Han, Tingting
    Chen, Taolue
    THE 12TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2020, 2021, : 185 - 188
  • [30] Semantic-aware conditional variational autoencoder for one-to-many dialogue generation
    Wang, Ye
    Liao, Jingbo
    Yu, Hong
    Leng, Jiaxu
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16): : 13683 - 13695