Multilingual Code Co-evolution using Large Language Models

被引:1
|
作者
Zhang, Jiyang [1 ]
Nie, Pengyu [1 ]
Li, Junyi Jessy [1 ]
Gligoric, Milos [1 ]
机构
[1] UT Austin, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Language models; code translation; software evolution;
D O I
10.1145/3611643.3616350
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers) or machine learning models for translating code from one language to another provides limited value. Translating each time the entire codebase from one language to another is not the way developers work. In this paper, we target a novel task: translating code changes from one programming language to another using large language models (LLMs). We design and implement the first LLM, dubbed Codeditor, to tackle this task. Codeditor explicitly models code changes as edit sequences and learns to correlate changes across programming languages. To evaluate Codeditor, we collect a corpus of 6,613 aligned code changes from 8 pairs of open-source software projects implementing similar functionalities in two programming languages ( Java and C#). Results show that Codeditor outperforms the state-of-the-art approaches by a large margin on all commonly used automatic metrics. Our work also reveals that Codeditor is complementary to the existing generation-based models, and their combination ensures even greater performance.
引用
收藏
页码:695 / 707
页数:13
相关论文
共 50 条
  • [1] CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
    Weyssow, Martin
    Di Sipio, Claudio
    Di Ruscio, Davide
    Sahraoui, Houari
    [J]. 2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 637 - 641
  • [2] The co-evolution of language and emotions
    Jablonka, Eva
    Ginsburg, Simona
    Dor, Daniel
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2012, 367 (1599) : 2152 - 2159
  • [3] CO-EVOLUTION THEORY OF GENETIC CODE
    WONG, JTF
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1975, 72 (05) : 1909 - 1912
  • [4] Co-evolution of language and of the language acquisition device
    Briscoe, T
    [J]. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 418 - 427
  • [5] Bootstrapping Multilingual Semantic Parsers using Large Language Models
    Awasthi, Abhijeet
    Gupta, Nitish
    Samanta, Bidisha
    Dave, Shachi
    Sarawagi, Sunita
    Talukdar, Partha
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2455 - 2467
  • [6] Co-evolution of human consciousness and language
    Arbib, MA
    [J]. CAJAL AND CONSCIOUSNESS: SCIENTIFIC APPROACHES TO CONSCIOUSNESS ON THE CENTENNIAL OF RAMON Y CAJAL'S TEXTURA, 2001, 929 : 195 - 220
  • [7] Co-evolution of the genetic code and ribozyme replication
    Stevenson, DS
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2002, 217 (02) : 235 - 253
  • [8] Analyzing the co-evolution of comments and source code
    Fluri, Beat
    Wuersch, Michael
    Giger, Emanuel
    Gall, Harald C.
    [J]. SOFTWARE QUALITY JOURNAL, 2009, 17 (04) : 367 - 394
  • [9] Co-Evolution of Source Code and the Build System
    Adams, Bram
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, CONFERENCE PROCEEDINGS, 2009, : 461 - 464
  • [10] Analyzing the co-evolution of comments and source code
    Beat Fluri
    Michael Würsch
    Emanuel Giger
    Harald C. Gall
    [J]. Software Quality Journal, 2009, 17 : 367 - 394