Deep Learning Based Code Generation Methods: Literature Review

被引:0
|
作者
Yang Z.-Z. [1 ]
Chen S.-R. [1 ]
Gao C.-Y. [1 ]
Li Z.-H. [2 ]
Li G. [3 ]
Lyu M.R.-T. [4 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
[2] Huawei Technologies Co. Ltd., Shenzhen
[3] School of Electronics Engineering and Computer Science, Peking University, Beijing
[4] Department of Computer Science and Engineering, Chinese University of Hong Kong, Hong Kong
来源
Ruan Jian Xue Bao/Journal of Software | 2024年 / 35卷 / 02期
关键词
code generation; code retrieval; deep learning; machine translation; post-processing;
D O I
10.13328/j.cnki.jos.006981
中图分类号
学科分类号
摘要
This study focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is writing a large amount of repetitive and low-technical code for implementing common functionalities. The other is writing code that depends on specific task requirements, which may necessitate external resources such as documentation or other tools. Therefore, code generation has received a lot of attention among academia and industry for assisting developers in coding. It has also been one of the key concerns in the field of software engineering to make machines understand users’ requirements and write programs on their own. The recent development of deep learning techniques, especially pre-training models, makes the code generation task achieve promising performance. In this study, the current work on deep learning-based code generation is systematically reviewed and the current deep learning-based code generation methods are classified into three categories: methods based on code features, methods incorporated with retrieval, and methods incorporated with post-processing. The first category refers to the methods that use deep learning algorithms for code generation based on code features, and the second and third categories improve the performance of the methods in the first category. The existing research results of each category of methods are systematically reviewed, summarized, and commented. Besides, the study analyzes the corpus and the popular evaluation metrics used in the existing code generation work. Finally, it summarizes the overall literature review and provides a prospect for future research directions worthy of attention. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
下载
收藏
页码:604 / 628
页数:24
相关论文
共 86 条
  • [1] Reply to Proposal No.3172 (Education No.297) of the Third Session of the 13th National Committee of the Chinese People’s Political Consultative Conference, (2020)
  • [2] Lu S, Guo DY, Ren S, Huang JJ, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang DX, Tang DY, Li G, Zhou LD, Shou LJ, Zhou L, Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu SY, Liu SJ., CodeXGLUE: A machine learning benchmark dataset for code understanding and generation, (2021)
  • [3] Little G, Miller RC., Keyword programming in Java, Automated Software Engineering, 16, 1, pp. 37-71, (2009)
  • [4] Gvero T, Kuncak V., Interactive synthesis using free-form queries, Proc. of the 37th IEEE/ACM IEEE Int’l Conf. on Software Engineering, pp. 689-692, (2015)
  • [5] Hindle A, Barr ET, Gabel M, Su ZD, Devanbu P., On the naturalness of software, Communications of the ACM, 59, 5, pp. 122-131, (2016)
  • [6] Chen ZZ, Yan M, Xia X, Liu ZX, Xu Z, Lei Y., Research progress of code naturalness and its application, Ruan Jian Xue Bao/Journal of Software, 33, 8, pp. 3015-3034, (2022)
  • [7] Wohlin C, Runeson P, Host M, Ohlsson MC, Regnell B, Wesslen A., Experimentation in Software Engineering, (2012)
  • [8] Feng ZY, Guo DY, Tang DY, Duan N, Feng XC, Gong M, Shou LJ, Qin B, Liu T, Jiang DX, Zhou M., CodeBERT: A pre-trained model for programming and natural languages, Proc. of the 2020 Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536-1547, (2020)
  • [9] Christopoulou F, Lampouras G, Gritta M, Zhang GC, Guo YP, Li ZQ, Zhang Q, Xiao M, Shen B, Li L, Yu H, Yan L, Zhou PY, Wang X, Ma YC, Iacobacci I, Wang YS, Liang GT, Wei JS, Jiang X, Wang QX, Liu Q., PanGu-Coder: Program synthesis with function-level language modeling, (2022)
  • [10] Li YJ, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Dal Lago A, Hubert T, Choy P, De Masson d'autume C, Babuschkin I, Chen XY, Huang PS, Welbl J, Gowal S, Cherepanov A, Molloy J, Mankowitz DJ, Sutherland Robson E, Kohli P, De Freitas N, Kavukcuoglu K, Vinyals O., Competition-level code generation with AlphaCode, Science, 378, 6624, pp. 1092-1097, (2022)