Automatic Code Semantic Tag Generation Approach Based on Software Knowledge Graph

被引:0
|
作者
Xing S.-S. [1 ,2 ]
Liu M.-W. [1 ,2 ]
Peng X. [1 ,2 ]
机构
[1] School of Computer Science, Fudan University, Shanghai
[2] Shanghai Key Laboratory of Data Science, Fudan University, Shanghai
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 11期
关键词
code search; knowledge graph; program comprehension; semantic tag;
D O I
10.13328/j.cnki.jos.006369
中图分类号
学科分类号
摘要
Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer’s needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through code search techniques based on information retrieval. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the issue, this study proposes an approach based on software knowledge graph (called KGCodeTagger) that automatically generates semantic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. The software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags are evaluated. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags, which can help developers quickly understand the intention of the code. © 2022 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:4027 / 4045
页数:18
相关论文
共 58 条
  • [1] Shivakumar SK., A survey and taxonomy of intent-based code search, Int’l Journal of Software Innovation, 9, 1, (2021)
  • [2] Stack overflow, (2020)
  • [3] Lv F, Zhang H, Lou JG, Wang S, Zhang D, Zhao J., CodeHow: Effective code search based on API understanding and extended boolean model, Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering, pp. 260-270, (2015)
  • [4] Wang S, Lo D, Vasilescu B, Serebrenik A., EnTagRec: An enhanced tag recommendation system for software information sites, Proc. of the 30th Int’l Conf. on Software Maintenance and Evolution, pp. 291-300, (2014)
  • [5] Liu J, Zhou P, Yang Z, Liu X, Grundy J., FastTagRec: Fast tag recommendation for software information sites, Automated Software Engineering, 25, 4, (2018)
  • [6] Wang H, Wang B, Li C, Xu L, He JJ, Yang MN., SOTagRec: A combined tag recommendation approach for stack overflow, Proc. of the 4th Int’l Conf, (2019)
  • [7] Chen W, Zhou JH, Zhu JX, Wu GQ, Wei J., Semi-supervised learning based tag recommendation for docker repositories, Journal of Computer Science and Technology, 34, 5, (2019)
  • [8] Zhou P, Liu J, Liu X, Yang Z, Grundy J., Is deep learning better than traditional approaches in tag recommendation for software information sites?, Information and Software Technology, 109, 5, (2019)
  • [9] Zhou P, Liu J, Yang Z, Zhou G., Scalable tag recommendation for software information sites, Proc. of the 24th Int’l Conf. on Software Analysis, Evolution and Reengineering, pp. 272-282, (2017)
  • [10] Xia X, Lo D, Wang X, Zhou B., Tag recommendation in software information sites, Proc. of the 10th Working Conf. on Mining Software Repositories, pp. 287-296, (2013)