Structured Generative Models of Natural Source Code

被引:0
|
作者
Maddison, Chris J. [1 ]
Tarlow, Daniel [2 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Microsoft Res, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of building generative models of natural source code (NSC); that is, source code written by humans and meant to be understood by humans. Our primary contribution is to describe new generative models that are tailored to NSC. The models are based on probabilistic context free grammars (PCFGs) and neuro-probabilistic language models (Mnih & Teh, 2012), which are extended to incorporate additional source code-specific structure. These models can be efficiently trained on a corpus of source code and outperform a variety of less structured baselines in terms of predictive log likelihoods on held-out data.
引用
收藏
页码:649 / 657
页数:9
相关论文
共 50 条
  • [1] Source code files as structured documents
    Maletic, JI
    Collard, ML
    Marcus, A
    [J]. 10TH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, PROCEEDINGS, 2002, : 289 - 292
  • [2] Visualizing Source Code as Comics Using Generative AI
    Heidrich, David
    Schreiber, Andreas
    [J]. 2023 IEEE WORKING CONFERENCE ON SOFTWARE VISUALIZATION, VISSOFT, 2023, : 40 - 44
  • [3] Generative Artificial Intelligence for the Visualization of Source Code as Comics
    Heidrich, David
    Schreiber, Andreas
    Theis, Sabine
    [J]. HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION, PT II, HIMI 2024, 2024, 14690 : 35 - 49
  • [4] Using Structured Queries for Source Code Search
    Eddy, Brian P.
    Kraft, Nicholas A.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2014, : 431 - 435
  • [5] Advances In Optimal Structured Source Code Design
    Kieffer, John C.
    Marcos, John
    [J]. 2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 13 - 22
  • [6] Sinkhorn Natural Gradient for Generative Models
    Shen, Zebang
    Wang, Zhenfu
    Ribeiro, Alejandro
    Hassani, Hamed
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Learning generative models of natural images
    Wu, JM
    Lin, ZH
    [J]. NEURAL NETWORKS, 2002, 15 (03) : 337 - 347
  • [8] NatGen: Generative Pre-training by "Naturalizing" Source Code
    Chakraborty, Saikat
    Ahmed, Toufique
    Ding, Yangruibo
    Devanbu, Premkumar T.
    Ray, Baishakhi
    [J]. PROCEEDINGS OF THE 30TH ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2022, 2022, : 18 - 30
  • [9] A Block-Structured Model for Source Code Retrieval
    Hsu, Sheng-Kuei
    Lin, Shi-Jen
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT II, 2011, 6592 : 161 - 170
  • [10] Source separation in structured nonlinear models
    Taleb, A
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 3513 - 3516