Natural Language to Code Generation in Interactive Data Science Notebooks

被引:0
|
作者
Yin, Pengcheng [1 ]
Li, Wen-Ding [1 ]
Xiao, Kefan [1 ]
Rao, Abhishek [1 ]
Wen, Yeming [1 ]
Shi, Kensen [1 ]
Howland, Joshua [1 ]
Bailey, Paige [1 ]
Catasta, Michele [1 ]
Michalewski, Henryk [1 ]
Polozov, Alex [1 ]
Sutton, Charles [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1,078 code generation problems using the pandas data analysis framework in data science notebooks. ARCADE features multiple rounds of NL-to-code problems from the same notebook. It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction. To establish a strong baseline on this challenging task, we develop PACHINCO, a 62B code language model (LM) for Python computational notebooks, which significantly outperforms public code LMs. Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and NL explanations, showing the potential to improve the diversity and explainability of model predictions. ARCADE is publicly available at https://github.com/google-research/arcade-nl2code/.
引用
收藏
页码:126 / 173
页数:48
相关论文
共 50 条
  • [41] Interactive Journalism: hackers, data, and code
    Broussard, Meredith
    DIGITAL JOURNALISM, 2017, 5 (07) : 940 - 942
  • [42] Interactive journalism: Hackers, data, and code
    Rooney, Shannon
    NEW MEDIA & SOCIETY, 2018, 20 (02) : 837 - 839
  • [43] Interactive Journalism: Hackers, Data, and Code
    Fell, Elena
    Lukianova, Natalia
    EUROPEAN JOURNAL OF COMMUNICATION, 2018, 33 (02) : 227 - 233
  • [44] Visualization in Reproducible Science A comparative overview of interactive Web Journals and computational notebooks
    Marques, Bruno Monteiro
    da Silva, Joao Rocha
    Devezas, Tiago
    2019 14TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2019,
  • [45] Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies
    Yang, Chen
    Liu, Yan
    Yin, Changqing
    ENTROPY, 2021, 23 (09)
  • [46] Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
    Wong, Man-Fai
    Guo, Shangxin
    Hang, Ching-Nam
    Ho, Siu-Wai
    Tan, Chee-Wei
    ENTROPY, 2023, 25 (06)
  • [47] Interactive Table Synthesis With Natural Language
    Huang, Yanwei
    Zhou, Yunfan
    Chen, Ran
    Pan, Changhao
    Shu, Xinhuan
    Weng, Di
    Wu, Yingcai
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6130 - 6145
  • [48] INTERACTIVE CONSULTING VIA NATURAL LANGUAGE
    SHAPIRO, SC
    KWASNY, SC
    COMMUNICATIONS OF THE ACM, 1975, 18 (08) : 459 - 462
  • [49] Interactive image retrieval by natural language
    Harada, S
    Itoh, Y
    Nakatani, H
    OPTICAL ENGINEERING, 1997, 36 (12) : 3281 - 3287
  • [50] THE NATURAL-LANGUAGE OF INTERACTIVE SYSTEMS
    LEDGARD, H
    WHITESIDE, JA
    SINGER, A
    SEYMOUR, W
    COMMUNICATIONS OF THE ACM, 1980, 23 (10) : 556 - 563