FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code

被引:0
|
作者
Nemania Borovits
Indika Kumara
Dario Di Nucci
Parvathy Krishnan
Stefano Dalla Palma
Fabio Palomba
Damian A. Tamburri
Willem-Jan van den Heuvel
机构
[1] Tilburg University,Jheronimus Academy of Data Science
[2] University of Salerno,Jheronimus Academy of Data Science
[3] Technical University Eindhoven,undefined
来源
关键词
Infrastructure as code; Linguistic anti-patterns; Word embedding; Machine learning; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Linguistic anti-patterns are recurring poor practices concerning inconsistencies in the naming, documentation, and implementation of an entity. They impede the readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in Infrastructure-as-Code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their short text names. To this end, we propose FindICI a novel automated approach that employs word embedding and classification algorithms. We build and use the abstract syntax tree of IaC code units to create code embeddings used by machine learning techniques to detect inconsistent IaC code units. We evaluated our approach with two experiments on Ansible tasks systematically extracted from open source repositories for various word embedding models and classification algorithms. Classical machine learning models and novel deep learning models with different word embedding methods showed comparable and satisfactory results in detecting inconsistent Ansible tasks related to the top-10 used Ansible modules.
引用
收藏
相关论文
共 16 条
  • [1] FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code
    Borovits, Nemania
    Kumara, Indika
    Di Nucci, Dario
    Krishnan, Parvathy
    Dalla Palma, Stefano
    Palomba, Fabio
    Tamburri, Damian A.
    van den Heuvel, Willem-Jan
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (07)
  • [2] Repairing Infrastructure-as-Code using Large Language Models
    Low, En
    Cheh, Carmen
    Chen, Binbin
    2024 IEEE SECURE DEVELOPMENT CONFERENCE, SECDEV 2024, 2024, : 20 - 27
  • [3] Analysis of Machine Code Using Natural Language Processing
    Khurpia, Naman
    2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, SMART AND GREEN TECHNOLOGIES (ICISSGT 2021), 2021, : 183 - 187
  • [4] Enhancing Code Review Efficiency - Automated Pull Request Evaluation Using Natural Language Processing and Machine Learning
    Zydron, Przemyslaw Wincenty
    Protasiewicz, Jaroslaw
    ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2023, 17 (04) : 162 - 167
  • [5] Using Natural Language Processing and Machine Learning to Detect Online Grooming Attacks
    Street, Jake
    Olajide, Funminiyi
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2022, 2024, 1454 : 261 - 270
  • [6] Machine Translation from Natural Language to Code Using Long-Short Term Memory
    Rahit, K. M. Tahsin Hassan
    Nabil, Rashidul Hasan
    Huq, Md Hasibul
    PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2019, VOL 1, 2020, 1069 : 56 - 63
  • [7] Utilizing Source Code Syntax Patterns to Detect Bug Inducing Commits using Machine Learning Models
    Nadim, Md
    Roy, Banani
    arXiv, 2022,
  • [8] Utilizing source code syntax patterns to detect bug inducing commits using machine learning models
    Nadim, Md
    Roy, Banani
    SOFTWARE QUALITY JOURNAL, 2023, 31 (03) : 775 - 807
  • [9] Utilizing source code syntax patterns to detect bug inducing commits using machine learning models
    Md Nadim
    Banani Roy
    Software Quality Journal, 2023, 31 : 775 - 807
  • [10] Using Verb Fluency, Natural Language Processing, and Machine Learning to Detect Alzheimer's Disease
    Soni, Aradhana
    Amrhein, Benjamin
    Baucum, Matthew
    Paek, Eun Jin
    Khojandi, Anahita
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 2282 - 2285