Syntax-aware on-the-fly code completion

被引:3
|
作者
Takerngsaksiri, Wannita [1 ]
Tantithamthavorn, Chakkrit [1 ]
Li, Yuan-Fang [1 ]
机构
[1] Monash Univ, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Code completion; Multi-task learning;
D O I
10.1016/j.infsof.2023.107336
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) information for model training, ensuring that code completion is aware of the syntax of the programming languages. However, existing syntax-aware code completion approaches are not on-the-fly, as we found that for every two-thirds of characters that developers type, AST fails to be extracted because it requires the syntactically correct source code, limiting its practicality in real-world scenarios. On the other hand, existing on-the-fly code completion does not consider syntactic information yet. Objective: In this paper, we propose PyCoder to leverage token types, a kind of lightweight syntactic information, which is readily available and aligns with the natural order of source code. Method: Our PyCoder is trained in a multi-task training manner so that by learning the supporting task of predicting token types during the training phase, the models achieve better performance on predicting tokens and lines of code without the need for token types in the inference phase. Results: Comprehensive experiments show that PyCoder achieves the first rank on the CodeXGLUE leaderboard with an accuracy of 77.12% for the token-level predictions, which is 0.43%-24.25% more accurate than baselines. In addition, PyCoder achieves an exact match of 43.37% for the line-level predictions, which is 3.63%-84.73% more accurate than baselines. Conclusions: These results lead us to conclude that token type information (an alternative to syntactic information) that is rarely used in the past can greatly improve the performance of code completion approaches, without requiring the syntactically correct source code like AST-based approaches do. Our PyCoder is publicly available on HuggingFace and GitHub.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Scalable Syntax-Aware Language Models Using Knowledge Distillation
    Kuncoro, Adhiguna
    Dyer, Chris
    Rimell, Laura
    Clark, Stephen
    Blunsom, Phil
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3472 - 3484
  • [32] Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder
    Chen, Huadong
    Huang, Shujian
    Chiang, David
    Chen, Jiajun
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1936 - 1945
  • [33] Recurrent graph encoder for syntax-aware neural machine translation
    Liang Ding
    Longyue Wang
    Siyou Liu
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1053 - 1062
  • [34] Syntax-aware Natural Language Inference with Graph Matching Networks
    Lin, Yan-Tong
    Wu, Meng-Tse
    Su, Keh-Yih
    2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 85 - 90
  • [35] Recurrent graph encoder for syntax-aware neural machine translation
    Ding, Liang
    Wang, Longyue
    Liu, Siyou
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1053 - 1062
  • [36] SEE: Syntax-Aware Entity Embedding for Neural Relation Extraction
    He, Zhengqiu
    Chen, Wenliang
    Li, Zhenghua
    Zhang, Meishan
    Zhang, Wei
    Zhang, Min
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5795 - 5802
  • [37] Syntax-Aware Complex-Valued Neural Machine Translation
    Liu, Yang
    Hou, Yuexian
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT V, 2023, 14258 : 474 - 485
  • [38] Syntax-Aware Multi-Spans Generation for Reading Comprehension
    Zhang, Zhuosheng
    Zhang, Yiqing
    Zhao, Hai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 260 - 268
  • [39] TreeGAN: Syntax-aware Sequence Generation with Generative Adversarial Networks
    Liu, Xinyue
    Kong, Xiangnan
    Liu, Lei
    Chiang, Kuorong
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1140 - 1145
  • [40] On-the-Fly Syntax Highlighting using Neural Networks
    Palma, Marco Edoardo
    Salza, Pasquale
    Gall, Harald C.
    PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 269 - 280