Syntax-aware on-the-fly code completion

被引:3
|
作者
Takerngsaksiri, Wannita [1 ]
Tantithamthavorn, Chakkrit [1 ]
Li, Yuan-Fang [1 ]
机构
[1] Monash Univ, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Code completion; Multi-task learning;
D O I
10.1016/j.infsof.2023.107336
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Code completion aims to help improve developers' productivity by suggesting the next code tokens from a given context. Various approaches have been proposed to incorporate abstract syntax tree (AST) information for model training, ensuring that code completion is aware of the syntax of the programming languages. However, existing syntax-aware code completion approaches are not on-the-fly, as we found that for every two-thirds of characters that developers type, AST fails to be extracted because it requires the syntactically correct source code, limiting its practicality in real-world scenarios. On the other hand, existing on-the-fly code completion does not consider syntactic information yet. Objective: In this paper, we propose PyCoder to leverage token types, a kind of lightweight syntactic information, which is readily available and aligns with the natural order of source code. Method: Our PyCoder is trained in a multi-task training manner so that by learning the supporting task of predicting token types during the training phase, the models achieve better performance on predicting tokens and lines of code without the need for token types in the inference phase. Results: Comprehensive experiments show that PyCoder achieves the first rank on the CodeXGLUE leaderboard with an accuracy of 77.12% for the token-level predictions, which is 0.43%-24.25% more accurate than baselines. In addition, PyCoder achieves an exact match of 43.37% for the line-level predictions, which is 3.63%-84.73% more accurate than baselines. Conclusions: These results lead us to conclude that token type information (an alternative to syntactic information) that is rarely used in the past can greatly improve the performance of code completion approaches, without requiring the syntactically correct source code like AST-based approaches do. Our PyCoder is publicly available on HuggingFace and GitHub.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Syntax-Aware Retrieval Augmented Code Generation
    Zhang, Xiangyu
    Zhou, Yu
    Yang, Guang
    Chen, Taolue
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1291 - 1302
  • [2] srcQL: A Syntax-Aware Query Language for Source Code
    Bartman, Brian
    Newman, Christian D.
    Collard, Michael L.
    Maletic, Jonathan I.
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), 2017, : 467 - 471
  • [3] A Syntax-Aware Encoder for Authorship Attribution
    Liu, Jianbo
    Hu, Zhiqiang
    Zhang, Jiasheng
    Lee, Roy Ka-Wei
    Shao, Jie
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT I, 2021, 13080 : 403 - 411
  • [4] Towards syntax-aware token embeddings
    Popa, Diana Nicoleta
    Perez, Julien
    Henderson, James
    Gaussier, Eric
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (06) : 691 - 720
  • [5] Syntax-Aware Representation for Aspect Term Extraction
    Zhang, Jingyuan
    Xu, Guangluan
    Wang, Xinyi
    Sun, Xian
    Huang, Tinglei
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 123 - 134
  • [6] Syntax-Aware Mutation for Testing the Solidity Compiler
    Mitropoulos, Charalambos
    Sotiropoulos, Thodoris
    Ioannidis, Sotiris
    Mitropoulos, Dimitris
    COMPUTER SECURITY - ESORICS 2023, PT III, 2024, 14346 : 327 - 347
  • [7] Towards Syntax-Aware Editors for Visual Languages
    Costagliola, Gennaro
    Deufemia, Vincenzo
    Polese, Giuseppe
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2005, 127 (04) : 107 - 125
  • [8] Syntax-aware Multilingual Semantic Role Labeling
    He, Shexia
    Li, Zuchao
    Zhao, Hai
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5350 - 5359
  • [9] Syntax-Aware Neural Semantic Role Labeling
    Xia, Qingrong
    Li, Zhenghua
    Zhang, Min
    Zhang, Meishan
    Fu, Guohong
    Wang, Rui
    Si, Luo
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7305 - 7313
  • [10] Building syntax-aware editors for visual languages
    Costagliola, G
    Deufemia, V
    Polese, G
    Risi, M
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2005, 16 (06): : 508 - 540