Cnerator: A Python']Python application for the controlled stochastic generation of standard C source code

被引:1
|
作者
Ortin, Francisco [1 ,2 ]
Escalada, Javier [1 ]
机构
[1] Univ Oviedo, Comp Sci Dept, Federico Garcia Lorca 18, Oviedo 33007, Spain
[2] Munster Technol Univ, Dept Comp Sci, Rossa Ave, Cork, Ireland
关键词
Big code; Mining software repositories; Machine learning; C programming language; Stochastic program generation; !text type='Python']Python[!/text;
D O I
10.1016/j.softx.2021.100711
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Big Code and Mining Software Repositories research lines analyze large amounts of source code to improve software engineering practices. Massive codebases are used to train machine learning models aimed at improving the software development process. One example is decompilation, where C code and its compiled binaries can be used to train machine learning models to improve decompilation. However, obtaining massive codebases of portable C code is not an easy task, since most applications use particular libraries, operating systems, or language extensions. In this paper, we present Cnerator, a Python application that provides the stochastic generation of large amounts of standard C code. It is highly configurable, allowing the user to specify the probability distributions of each language construct, properties of the generated code, and post-processing modifications of the output programs. Cnerator has been successfully used to generate code that, utilized to train machine learning models, has improved the performance of existing decompilers. It has also been used in the implementation of an infrastructure for the automatic extraction of code patterns. (C) 2021 The Author(s). Published by Elsevier B.V.
引用
下载
收藏
页数:7
相关论文
共 50 条
  • [31] p-winds: An open-source Python']Python code to model planetary outflows and upper atmospheres⋆
    Dos Santos, Leonardo A.
    Vidotto, Aline A.
    Vissapragada, Shreyas
    Alam, Munazza K.
    Allart, Romain
    Bourrier, Vincent
    Kirk, James
    Seidel, Julia V.
    Ehrenreich, David
    ASTRONOMY & ASTROPHYSICS, 2022, 659
  • [32] GWFAST: A Fisher Information Matrix Python']Python Code for Third-generation Gravitational-wave Detectors
    Iacovelli, Francesco
    Mancarella, Michele
    Foffa, Stefano
    Maggiore, Michele
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2022, 263 (01):
  • [33] Application of Open-Source, Python']Python-Based Tools for the Simulation of Electrochemical Systems
    Molel, Evans Leshinka
    Fuller, Thomas F.
    JOURNAL OF THE ELECTROCHEMICAL SOCIETY, 2023, 170 (10)
  • [34] CHROMSTRUCT 4: A Python']Python Code to Estimate the Chromatin Structure from Hi-C Data
    Caudai, Claudia
    Salerno, Emanuele
    Zoppe, Monica
    Merelli, Ivan
    Tonazzini, Anna
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (06) : 1867 - 1878
  • [35] Computing with CodeRunner at Coventry University Automated summative assessment of Python']Python and C plus plus code
    Croft, David
    England, Matthew
    PROCEEDINGS OF THE 4TH CONFERENCE ON COMPUTING EDUCATION PRACTICE, CEP 2020, 2020,
  • [36] CPPE: An Open-Source C plus plus and Python']Python Library for Polarizable Embedding
    Scheurer, Maximilian
    Reinholdt, Peter
    Kjellgren, Erik Rosendahl
    Olsen, Jogvan Magnus Haugaard
    Dreuw, Andreas
    Kongsted, Jacob
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2019, 15 (11) : 6154 - 6163
  • [37] A Lightweight DFT-Based Approach to the Optical Measurement of Displacements Using an Open-Source Python']Python Code
    Nezerka, V
    Havlasek, P.
    EXPERIMENTAL TECHNIQUES, 2022, 46 (03) : 485 - 496
  • [38] A Controlled Experiment on Python']Python vs C for an Introductory Programming Course: Student's Outcomes
    Wainer, Jacques
    Xavier, Eduardo C.
    ACM TRANSACTIONS ON COMPUTING EDUCATION, 2018, 18 (03):
  • [39] Code Analysis with Static Application Security Testing for Python Program
    Li Ma
    Huihong Yang
    Jianxiong Xu
    Zexian Yang
    Qidi Lao
    Dong Yuan
    Journal of Signal Processing Systems, 2022, 94 : 1169 - 1182
  • [40] CHICOM: Code for comparing weighted or unweighted histograms in Fortran-77, C++, R and Python']Python
    Gagunashvili, Nikolay D.
    Halldorsson, Helgi
    Neukirchen, Helmut
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 245