Syntax-aware on-the-fly code completion

被引:3
|
作者
Takerngsaksiri, Wannita [1 ]
Tantithamthavorn, Chakkrit [1 ]
Li, Yuan-Fang [1 ]
机构
[1] Monash Univ, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Code completion; Multi-task learning;
D O I
10.1016/j.infsof.2023.107336
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) information for model training, ensuring that code completion is aware of the syntax of the programming languages. However, existing syntax-aware code completion approaches are not on-the-fly, as we found that for every two-thirds of characters that developers type, AST fails to be extracted because it requires the syntactically correct source code, limiting its practicality in real-world scenarios. On the other hand, existing on-the-fly code completion does not consider syntactic information yet. Objective: In this paper, we propose PyCoder to leverage token types, a kind of lightweight syntactic information, which is readily available and aligns with the natural order of source code. Method: Our PyCoder is trained in a multi-task training manner so that by learning the supporting task of predicting token types during the training phase, the models achieve better performance on predicting tokens and lines of code without the need for token types in the inference phase. Results: Comprehensive experiments show that PyCoder achieves the first rank on the CodeXGLUE leaderboard with an accuracy of 77.12% for the token-level predictions, which is 0.43%-24.25% more accurate than baselines. In addition, PyCoder achieves an exact match of 43.37% for the line-level predictions, which is 3.63%-84.73% more accurate than baselines. Conclusions: These results lead us to conclude that token type information (an alternative to syntactic information) that is rarely used in the past can greatly improve the performance of code completion approaches, without requiring the syntactically correct source code like AST-based approaches do. Our PyCoder is publicly available on HuggingFace and GitHub.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Syntax-Aware Aspect Level Sentiment Classification with Graph Attention Networks
    Huang, Binxuan
    Carley, Kathleen M.
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5469 - 5477
  • [42] On-the-Fly Power-Aware Rendering
    Zhang, Yunjin
    Ortin, Marta
    Arellano, Victor
    Wang, Rui
    Gutierrez, Diego
    Bao, Hujun
    COMPUTER GRAPHICS FORUM, 2018, 37 (04) : 155 - 166
  • [43] Syntax-Aware Opinion Role Labeling with Dependency Graph Convolutional Networks
    Zhang, Bo
    Zhang, Yue
    Wang, Rui
    Li, Zhenghua
    Zhang, Min
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3249 - 3258
  • [44] Leveraging syntax-aware models and triaffine interactions for nominal chain extraction
    Lou, Yinxia
    Zhu, Xun
    Chen, Ming
    Ji, Donghong
    Zhou, Junxiang
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [45] Syntax-aware neural machine translation directed by syntactic dependency degree
    Ru Peng
    Tianyong Hao
    Yi Fang
    Neural Computing and Applications, 2021, 33 : 16609 - 16625
  • [46] GRAPHSPEECH: SYNTAX-AWARE GRAPH ATTENTION NETWORK FOR NEURAL SPEECH SYNTHESIS
    Liu, Rui
    Sisman, Berrak
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6059 - 6063
  • [47] Syntax-aware neural machine translation directed by syntactic dependency degree
    Peng, Ru
    Hao, Tianyong
    Fang, Yi
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16609 - 16625
  • [48] On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups
    Palma, Marco Edoardo
    Wolf, Alex
    Salza, Pasquale
    Gall, Harald C.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (02) : 355 - 370
  • [49] Syntax-aware Neural Semantic Role Labeling for Morphologically Rich Languages
    Vasic, Daniel
    Vasic, Mirela Kundid
    2020 28TH INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM), 2020, : 327 - 332
  • [50] Fuzzing Java']JavaScript engines with a syntax-aware neural program model
    Xu, Haoran
    Wang, Yongjun
    Jiang, Zhiyuan
    Fan, Shuhui
    Fu, Shaojing
    Xie, Peidai
    COMPUTERS & SECURITY, 2024, 144