Syntax-aware on-the-fly code completion

被引：3

作者：

Takerngsaksiri, Wannita ^{[1
]}

Tantithamthavorn, Chakkrit ^{[1
]}

Li, Yuan-Fang ^{[1
]}

机构：

[1] Monash Univ, Melbourne, Vic, Australia

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2024年 / 165卷

基金：

澳大利亚研究理事会;

关键词：

Code completion; Multi-task learning;

D O I：

10.1016/j.infsof.2023.107336

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) information for model training, ensuring that code completion is aware of the syntax of the programming languages. However, existing syntax-aware code completion approaches are not on-the-fly, as we found that for every two-thirds of characters that developers type, AST fails to be extracted because it requires the syntactically correct source code, limiting its practicality in real-world scenarios. On the other hand, existing on-the-fly code completion does not consider syntactic information yet. Objective: In this paper, we propose PyCoder to leverage token types, a kind of lightweight syntactic information, which is readily available and aligns with the natural order of source code. Method: Our PyCoder is trained in a multi-task training manner so that by learning the supporting task of predicting token types during the training phase, the models achieve better performance on predicting tokens and lines of code without the need for token types in the inference phase. Results: Comprehensive experiments show that PyCoder achieves the first rank on the CodeXGLUE leaderboard with an accuracy of 77.12% for the token-level predictions, which is 0.43%-24.25% more accurate than baselines. In addition, PyCoder achieves an exact match of 43.37% for the line-level predictions, which is 3.63%-84.73% more accurate than baselines. Conclusions: These results lead us to conclude that token type information (an alternative to syntactic information) that is rarely used in the past can greatly improve the performance of code completion approaches, without requiring the syntactically correct source code like AST-based approaches do. Our PyCoder is publicly available on HuggingFace and GitHub.

引用

页数：15

共 50 条

[31] Scalable Syntax-Aware Language Models Using Knowledge Distillation
Kuncoro, Adhiguna
Dyer, Chris
Rimell, Laura
Clark, Stephen
Blunsom, Phil
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3472 - 3484
[32] Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder
Chen, Huadong
Huang, Shujian
Chiang, David
Chen, Jiajun
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1936 - 1945
[33] Recurrent graph encoder for syntax-aware neural machine translation
Liang Ding
Longyue Wang
Siyou Liu
International Journal of Machine Learning and Cybernetics, 2023, 14 : 1053 - 1062
[34] Syntax-aware Natural Language Inference with Graph Matching Networks
Lin, Yan-Tong
Wu, Meng-Tse
Su, Keh-Yih
2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 85 - 90
[35] Recurrent graph encoder for syntax-aware neural machine translation
Ding, Liang
Wang, Longyue
Liu, Siyou
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1053 - 1062
[36] SEE: Syntax-Aware Entity Embedding for Neural Relation Extraction
He, Zhengqiu
Chen, Wenliang
Li, Zhenghua
Zhang, Meishan
Zhang, Wei
Zhang, Min
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5795 - 5802
[37] Syntax-Aware Complex-Valued Neural Machine Translation
Liu, Yang
Hou, Yuexian
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT V, 2023, 14258 : 474 - 485
[38] Syntax-Aware Multi-Spans Generation for Reading Comprehension
Zhang, Zhuosheng
Zhang, Yiqing
Zhao, Hai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 260 - 268
[39] TreeGAN: Syntax-aware Sequence Generation with Generative Adversarial Networks
Liu, Xinyue
Kong, Xiangnan
Liu, Lei
Chiang, Kuorong
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1140 - 1145
[40] On-the-Fly Syntax Highlighting using Neural Networks
Palma, Marco Edoardo
Salza, Pasquale
Gall, Harald C.
PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 269 - 280

← 1 2 3 4 5 →