Automatic Code Semantic Tag Generation Approach Based on Software Knowledge Graph

被引:0
|
作者
Xing S.-S. [1 ,2 ]
Liu M.-W. [1 ,2 ]
Peng X. [1 ,2 ]
机构
[1] School of Computer Science, Fudan University, Shanghai
[2] Shanghai Key Laboratory of Data Science, Fudan University, Shanghai
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 11期
关键词
code search; knowledge graph; program comprehension; semantic tag;
D O I
10.13328/j.cnki.jos.006369
中图分类号
学科分类号
摘要
Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer’s needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through code search techniques based on information retrieval. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the issue, this study proposes an approach based on software knowledge graph (called KGCodeTagger) that automatically generates semantic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. The software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags are evaluated. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags, which can help developers quickly understand the intention of the code. © 2022 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:4027 / 4045
页数:18
相关论文
共 58 条
  • [21] Arora C, Sabetzadeh M, Briand L, Zimmer F., Automated extraction and clustering of requirements glossary terms, IEEE Trans. on Software Engineering, 43, 10, (2017)
  • [22] (2020)
  • [23] Karthik S, Medvidovic N., Automatic detection of latent software component relationships from online Q&A sites, Proc. of the 7th Int’l Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 15-21, (2019)
  • [24] Zhao X, Xing Z, Kabir MA, Sawada N, Li J, Lin SW., HDSKG: Harvesting domain specific knowledge graph from content of webpages, Proc. of the 24th Int’l Conf. on Software Analysis, Evolution and Reengineering, pp. 56-67, (2017)
  • [25] Wang F, Liu JP, Liu B, Qian TY, Xiao YH, Peng ZY., Survey on construction of code knowledge graph and intelligent software development, Ruan Jian Xue Bao/Journal of Software, 31, 1, (2020)
  • [26] Liu M, Peng X, Marcus A, Xing Z, Xie W, Xing S, Liu Y., Generating query-specific class API summaries, Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering, pp. 120-130, (2019)
  • [27] Liu Y, Liu M, Peng X, Treude C, Xing Z, Zhang X., Generating concept based API element comparison using a knowledge graph, Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering, (2020)
  • [28] Sun J, Xing Z, Chu R, Bai H, Wang J, Peng X., Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph, Proc. of the Int’l Conf. on Software Maintenance and Evolution, pp. 257-268, (2019)
  • [29] (2020)
  • [30] (2020)