Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline

被引:16
|
作者
Mantyla, Mika V. [1 ]
Calefato, Fabio [2 ]
Claes, Maelick [1 ]
机构
[1] Univ Oulu, M3S, Oulu, Finland
[2] Univ Bari, Dipartimento Jon, Bari, Italy
关键词
natural language processing; preprocessing; filtering; machine learning; regular expressions; character n-grams; glmnet; lasso; logistic regression;
D O I
10.1145/3196398.3196444
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976-0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.
引用
收藏
页码:387 / 391
页数:5
相关论文
共 50 条
  • [1] The Use of Text Retrieval and Natural Language Processing in Software Engineering
    Haiduc, Sonia
    Arnaoudova, Venera
    Marcus, Andrian
    Antoniol, Giuliano
    [J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 898 - 899
  • [2] The Use of Text Retrieval and Natural Language Processing in Software Engineering
    Arnaoudova, Venera
    Haiduc, Sonia
    Marcus, Andrian
    Antoniol, Giuliano
    [J]. 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol 2, 2015, : 949 - 950
  • [3] Natural Language Processing and Text Mining to Identify Knowledge Profiles for Software Engineering Positions
    Valdez-Almada, Rogelio
    Rodriguez-Elias, Oscar M.
    Enrique Rose-Gomez, Cesar
    De Jesus Velazquez-Mendoza, Maria
    Gonzalez-Lopez, Samuel
    [J]. 2017 5TH INTERNATIONAL CONFERENCE IN SOFTWARE ENGINEERING RESEARCH AND INNOVATION (CONISOFT 2017), 2017, : 97 - 106
  • [4] Special section on natural language in software engineering
    Sawyer, Pete
    Gervasi, Vincenzo
    [J]. IET SOFTWARE, 2008, 2 (01) : 1 - 2
  • [5] The Text-Package: An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers
    Kjell, Oscar
    Giorgi, Salvatore
    Schwartz, H. Andrew
    [J]. PSYCHOLOGICAL METHODS, 2023, 28 (06) : 1478 - 1498
  • [6] Natural Language User Interface For Software Engineering Tasks
    Wachtel, Alexander
    Klamroth, Jonas
    Tichy, Walter F.
    [J]. ACHI 2017: THE TENTH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER-HUMAN INTERACTIONS, 2017, : 34 - 39
  • [7] Application of analysis software to pipeline stress in engineering design
    Hu, Shuyu
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2019, 19 (S1) : S203 - S209
  • [8] A Comparison of Natural Language Understanding Platforms for Chatbots in Software Engineering
    Abdellatif, Ahmad
    Badran, Khaled
    Costa, Diego Elias
    Shihab, Emad
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (08) : 3087 - 3102
  • [9] When Natural Language Processing Jumps into Collaborative Software Engineering
    Gilson, Fabian
    Weyns, Danny
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C 2019), 2019, : 238 - 241
  • [10] 1st International Workshop on Natural Language Analysis in Software Engineering (NaturaLiSE 2013)
    Pollock, Lori
    Binkley, David
    Lawrie, Dawn
    Hill, Emily
    Oliveto, Rocco
    Bavota, Gabriele
    Bacchelli, Alberto
    [J]. PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), 2013, : 1537 - +