Linguacodus: a synergistic framework for transformative code generation in machine learning pipelines

被引:0
|
作者
Trofirmova, Ekaterina [1 ]
Sataev, Emil [1 ]
Ustyuzhanin, Andrey [2 ,3 ]
机构
[1] Higher Sch Econ, Fac Comp Sci, Moscow, Russia
[2] Natl Univ Singapore, IFIM, Singapore, Singapore
[3] Constructor Univ, Sch Comp Sci & Engn, Bremen, Germany
基金
俄罗斯科学基金会;
关键词
Automated code generation; Large language models; Machine learning pipelines;
D O I
10.7717/peerj-cs.2328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the ever-evolving landscape of machine learning, seamless translation of natural language descriptions into executable code remains a formidable challenge. This article introduces Linguacodus, an innovative framework designed to tackle this challenge by deploying a dynamic pipeline that iteratively transforms natural language task descriptions into code through high-level data-shaping instructions. The core of Linguacodus is a fine-tuned large language model, empowered to evaluate diverse solutions for various problems and select the most fitting one for a given task. This article details the fine-tuning process and sheds light on how natural language descriptions can be translated into functional code. Linguacodus represents a substantial leap towards automated code generation, effectively bridging the gap between task descriptions and executable code. It holds great promise for advancing machine learning applications across diverse domains. Additionally, we propose an algorithm capable of transforming a natural description of an ML task into code with minimal human interaction. In extensive experiments on a vast machine learning code dataset originating from Kaggle, we showcase the effectiveness of Linguacodus. The investigations highlight its potential applications across diverse domains, emphasizing its impact on applied machine learning in various scientific fields.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] Using Code Generation to Enforce Uniformity in Software Delivery Pipelines
    Jones, Christopher
    [J]. SOFTWARE ENGINEERING ASPECTS OF CONTINUOUS DEVELOPMENT AND NEW PARADIGMS OF SOFTWARE PRODUCTION AND DEPLOYMENT, DEVOPS 2018, 2019, 11350 : 155 - 168
  • [42] An Intermediate Representation for Optimizing Machine Learning Pipelines
    Kunft, Andreas
    Katsifodimos, Asterios
    Schelter, Sebastian
    Bress, Sebastian
    Rabl, Tilmann
    Markl, Volker
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (11): : 1553 - 1567
  • [43] Can Machine Learning Pipelines Be Better Configured?
    Wang, Yibo
    Wang, Ying
    Zhang, Tingwei
    Yu, Yue
    Cheung, Shing-Chi
    Yu, Hai
    Zhu, Zhiliang
    [J]. PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 463 - 475
  • [44] Directed Proof Generation for Machine Code
    Thakur, Aditya
    Lim, Junghee
    Lal, Akash
    Burton, Amanda
    Driscoll, Evan
    Elder, Matt
    Andersen, Tycho
    Reps, Thomas
    [J]. COMPUTER AIDED VERIFICATION, PROCEEDINGS, 2010, 6174 : 288 - +
  • [45] Towards Observability for Production Machine Learning Pipelines
    Shankar, Shreya
    Parameswaran, Aditya G.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (13): : 4015 - 4022
  • [46] Code Generation Framework for Grid Development
    JIANG Ling-yun1
    2.State Key Laboratory for Novel Software Technology
    [J]. The Journal of China Universities of Posts and Telecommunications, 2006, (02) : 39 - 42
  • [47] Modeling and Code Generation Framework for IoT
    Sharaf, Mohammad
    Abusair, Mai
    Eleiwi, Rami
    Shana'a, Yara
    Saleh, Ithar
    Muccini, Henry
    [J]. SYSTEM ANALYSIS AND MODELING: LANGUAGES, METHODS, AND TOOLS FOR INDUSTRY 4.0, SAM 2019, 2019, 11753 : 99 - 115
  • [48] An agile and extensible code generation framework
    Kolovos, DS
    Paige, RF
    Polack, FAC
    [J]. EXTREME PROGRAMMING AND AGILE PROCESSES IN SOFTWARE ENGINEERING, PROCEEDINGS, 2005, 3556 : 226 - 229
  • [49] Framework for model transformation and code generation
    Oldevik, J
    Solberg, A
    Elvesæter, B
    Berre, AJ
    [J]. SIXTH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE, PROCEEDINGS, 2002, : 181 - 189
  • [50] The transformative potential of machine learning for experiments in fluid mechanics
    Ricardo Vinuesa
    Steven L. Brunton
    Beverley J. McKeon
    [J]. Nature Reviews Physics, 2023, 5 : 536 - 545