All You Need Is Logs: Improving Code Completion by Learning from Anonymous IDE Usage Logs

被引:2
|
作者
Bibaev, Vitaliy [1 ]
Kalina, Alexey [2 ]
Lomshakov, Vadim [3 ]
Golubev, Yaroslav [1 ]
Bezzubov, Alexander [4 ]
Povarov, Nikita [4 ]
Bryksin, Timofey [5 ]
机构
[1] JetBrains, Belgrade, Serbia
[2] JetBrains, Munich, Germany
[3] JetBrains, St Petersburg, Russia
[4] JetBrains, Amsterdam, Netherlands
[5] JetBrains Res, Limassol, Cyprus
关键词
anonymous usage logs; code completion; integrated development environment; machine learning; A/B-testing;
D O I
10.1145/3540250.3558968
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this work, we propose an approach for collecting completion usage logs from the users in an IDE and using them to train a machine learning based model for ranking completion candidates. We developed a set of features that describe completion candidates and their context, and deployed their anonymized collection in the Early Access Program of IntelliJ-based IDEs. We used the logs to collect a dataset of code completions from users, and employed it to train a ranking CatBoost model. Then, we evaluated it in two settings: on a held-out set of the collected completions and in a separate A/B test on two different groups of users in the IDE. Our evaluation shows that using a simple ranking model trained on the past user behavior logs significantly improved code completion experience. Compared to the default heuristics-based ranking, our model demonstrated a decrease in the number of typing actions necessary to perform the completion in the IDE from 2.073 to 1.832. The approach adheres to privacy requirements and legal constraints, since it does not require collecting personal information, performing all the necessary anonymization on the client's side. Importantly, it can be improved continuously: implementing new features, collecting new data, and evaluating new models - this way, we have been using it in production since the end of 2020.
引用
下载
收藏
页码:1269 / 1279
页数:11
相关论文
共 5 条
  • [1] How Much Logs Does My Source Code File Need? Learning to Predict the Density of Logs
    Batoun, Mohamed Amine
    Sayagh, Mohammed
    Ouni, Ali
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 140 - 149
  • [2] Improving multiwell petrophysical interpretation from well logs via machine learning and statistical models
    Pan, Wen
    Torres-Verdin, Carlos
    Duncan, Ian J.
    Pyrcz, Michael J.
    GEOPHYSICS, 2023, 88 (02) : D159 - D175
  • [3] Improving Segmentation of Breast Arterial Calcifications from Digital Mammography: Good Annotation is All You Need
    Wang, Kaier
    Hill, Melissa
    Knowles-Barley, Seymour
    Tikhonov, Aristarkh
    Litchfield, Lester
    Bare, James Christopher
    COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 134 - 150
  • [4] Volume is All You Need: Improving Multi-task Multiple Instance Learning for WMH Segmentation and Severity Estimation
    Jung, Wooseok
    Suh, Chong Hyun
    Shim, Woo Hyun
    Kim, Jinyoung
    Lee, Dongsoo
    Park, Changhyun
    Kong, Seo Taek
    Jung, Kyu-Hwan
    Heo, Hwon
    Kim, Sang Joon
    MACHINE LEARNING IN CLINICAL NEUROIMAGING, MLCN 2022, 2022, 13596 : 23 - 31
  • [5] A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and Touch
    Ceola, Federico
    Maiettini, Elisa
    Rosasco, Lorenzo
    Natale, Lorenzo
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 2985 - 2992