Cnerator: A Python']Python application for the controlled stochastic generation of standard C source code

被引:1
|
作者
Ortin, Francisco [1 ,2 ]
Escalada, Javier [1 ]
机构
[1] Univ Oviedo, Comp Sci Dept, Federico Garcia Lorca 18, Oviedo 33007, Spain
[2] Munster Technol Univ, Dept Comp Sci, Rossa Ave, Cork, Ireland
关键词
Big code; Mining software repositories; Machine learning; C programming language; Stochastic program generation; !text type='Python']Python[!/text;
D O I
10.1016/j.softx.2021.100711
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The Big Code and Mining Software Repositories research lines analyze large amounts of source code to improve software engineering practices. Massive codebases are used to train machine learning models aimed at improving the software development process. One example is decompilation, where C code and its compiled binaries can be used to train machine learning models to improve decompilation. However, obtaining massive codebases of portable C code is not an easy task, since most applications use particular libraries, operating systems, or language extensions. In this paper, we present Cnerator, a Python application that provides the stochastic generation of large amounts of standard C code. It is highly configurable, allowing the user to specify the probability distributions of each language construct, properties of the generated code, and post-processing modifications of the output programs. Cnerator has been successfully used to generate code that, utilized to train machine learning models, has improved the performance of existing decompilers. It has also been used in the implementation of an infrastructure for the automatic extraction of code patterns. (C) 2021 The Author(s). Published by Elsevier B.V.
引用
下载
收藏
页数:7
相关论文
共 50 条
  • [41] pyCSAMT: An alternative Python']Python toolbox for groundwater exploration using controlled source audio-frequency magnetotelluric
    Kouadio, Kouao Laurent
    Liu, Rong
    Mi, Binbin
    Liu, Chun-ming
    JOURNAL OF APPLIED GEOPHYSICS, 2022, 201
  • [42] PyMTL3: A Python']Python Framework for Open-Source Hardware Modeling, Generation, Simulation, and Verification
    Jiang, Shunning
    Pan, Peitian
    Ou, Yanghui
    Batten, Christopher
    IEEE MICRO, 2020, 40 (04) : 58 - 66
  • [43] Using Software Metrics for Predicting Vulnerable Code-Components: A Study on Java']Java and Python']Python Open Source Projects
    Chong, Tai-Yin
    Anu, Vaibhav
    Sultana, Kazi Zakia
    2019 22ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (IEEE CSE 2019) AND 17TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (IEEE EUC 2019), 2019, : 98 - 103
  • [44] ERSN-OpenMC-Py: A python']python-based open-source software for OpenMC Monte Carlo code
    Lahdour, M.
    El Bardouni, T.
    El Hajjaji, O.
    EL Bakkali, J.
    Al-Zain, J.
    Oulad-Belayachi, S.
    Ziani, H.
    Idrissi, Abdelghani
    Hlaibi, S. El Maliki El
    COMPUTER PHYSICS COMMUNICATIONS, 2024, 299
  • [45] mfapy: An open-source Python']Python package for 13C-based metabolic flux analysis
    Matsuda, Fumio
    Maeda, Kousuke
    Taniguchi, Takeo
    Kondo, Yuya
    Yatabe, Futa
    Okahashi, Nobuyuki
    Shimizu, Hiroshi
    METABOLIC ENGINEERING COMMUNICATIONS, 2021, 13
  • [46] Using the uniqueness of global identifiers to determine the provenance of Python software source code
    Yiming Sun
    Daniel German
    Stefano Zacchiroli
    Empirical Software Engineering, 2023, 28
  • [47] Cardio PyMEA: A user-friendly, open-source Python']Python application for cardiomyocyte microelectrode array analysis
    Dunham, Christopher R.
    Mackenzie, Madelynn
    Nakano, Haruko Z.
    Kim, Alexis K.
    Nakano, Atsushi
    Stieg, Adam
    Gimzewski, James
    PLOS ONE, 2022, 17 (05):
  • [48] Water Data Explorer: An Open-Source Web Application and Python']Python Library for Water Resources Data Discovery
    Bustamante, Giovanni Romero
    Nelson, Everett James
    Ames, Daniel P.
    Williams, Gustavious P.
    Jones, Norman L.
    Boldrini, Enrico
    Chernov, Igor
    Sanchez Lozano, Jorge Luis
    WATER, 2021, 13 (13)
  • [49] CHIWEI: A code of goodness of fit tests for weighted and unweighted histograms in Fortran-77, C++, R and Python']Python
    Gagunashvili, Nikolay D.
    Halldorsson, Helgi
    COMPUTER PHYSICS COMMUNICATIONS, 2018, 231 : 245 - 245
  • [50] CiRA: An Open-Source Python']Python Package for Automated Generation of Test Case Descriptions from Natural Language Requirements
    Frattini, Julian
    Fischbach, Jamiik
    Bauer, Andreas
    2023 IEEE 31ST INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW, 2023, : 68 - 71