Bash Code Comment Generation Method Based on Dual Information Retrieval

被引:0
|
作者
Chen X. [1 ,2 ]
Yu C. [1 ]
Yang G. [1 ]
Pu X.-L. [3 ]
Cui Z.-Q. [4 ]
机构
[1] School of Information Science and Technology, Nantong University, Nantong
[2] State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing
[3] Economics and Management School, Nantong University, Nantong
[4] School of Computer, Beijing Information Science and Technology University, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 03期
关键词
Bash code; code comment generation; code lexical; code semantic; information retrieval; program comprehension;
D O I
10.13328/j.cnki.jos.006690
中图分类号
学科分类号
摘要
Bash is the default shell command language for Linux, which plays an important role in the development and maintenance of Linux systems. Nevertheless, understanding the purpose and functionality of the Bash code is a challenging task. Therefore, an automatic method ExplainBash is proposed based on dual information retrieval for automatic Bash code comment generation. Specifically, the proposed method is based on semantic similarity and lexical similarity to perform dual information retrieval, which aims to generate high-quality code comments. For semantic similarity, CodeBERT and BERT-whitening operator are used to learn the code semantic representation, and Euclidean distance is resorted to compute semantic similarity; while for lexical similarity, code is represented as a set of code tokens, then the edit distance is resorted to compute lexical similarity. A high-quality corpus is constructed based on the corpus shared in the NL2Bash study and the data shared in the NLC2CMD competition. After that, nine state-of-the-art baselines are selected from the automatic code comment generation domain, which cover the information retrieval-based methods and deep learning-based methods. Results of empirical study and human study verify the effectiveness of the proposed method. Ablation experiments are also designed to analyze the rationality of the settings (such as retrieval strategy, BERT-whitening operator) in the proposed method. Finally, a browser plug-in is developed based on the proposed method to facilitate the code comprehension of the Bash code. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1310 / 1329
页数:19
相关论文
共 48 条
  • [1] Lin XV, Wang C, Zettlemoyer L, Et al., NL2Bash: A corpus and semantic parser for natural language interface to the Linux operating system, Proc. of the 11th Int’l Conf. on Language Resources and Evaluation (LREC 2018), pp. 3107-3118, (2018)
  • [2] Chen X, Yang G, Cui ZQ, Meng GZ, Wang Z., Survey of state-of-the-art automatic code comment generation, Ruan Jian Xue Bao/ Journal of Software, 32, 7, pp. 2118-2141, (2021)
  • [3] Cho K, van Merrienboer B, Gulcehre C, Et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proc. of the Int’l Conf. on Empirical Methods in Natural Language Processing (EMNLP 2014), pp. 1724-1734, (2014)
  • [4] Sutskever I, Vinyals O, Le QV., Sequence to sequence learning with neural networks, Proc. of the 27th Int’l Conf. on Neural Information Processing Systems, 2, pp. 3104-3112, (2014)
  • [5] Bahdanau D, Cho K, Bengio Y., Neural machine translation by jointly learning to align and translate, (2014)
  • [6] Luong T, Pham H, Manning CD., Effective approaches to attention-based neural machine translation, Proc. of the EMNLP, pp. 1412-1421, (2015)
  • [7] Iyer S, Konstas I, Cheung A, Et al., Summarizing source code using a neural attention model, Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Vol.1: Long Papers), pp. 2073-2083, (2016)
  • [8] Hu X, Li G, Xia X, Et al., Deep code comment generation, Proc. of the IEEE/ACM 26th Int’l Conf. on Program Comprehension (ICPC), pp. 200-210, (2018)
  • [9] Yang G, Chen X, Cao J, Et al., ComFormer: Code comment generation via transformer and fusion method-based hybrid code representation, Proc. of the 8th Int’l Conf. on Dependable Systems and Their Applications (DSA), pp. 30-41, (2021)
  • [10] Fu W, Menzies T., Easy over hard: A case study on deep learning, Proc. of the 11th Joint Meeting on Foundations of Software Engineering, pp. 49-60, (2017)