Reducing efforts of software engineering systematic literature reviews updates using text classification

被引:13
|
作者
Watanabe, Willian Massami [1 ]
Felizardo, Katia Romero [1 ]
Candido Jr, Arnaldo [2 ]
de Souza, Erica Ferreira [1 ]
de Campos Neto, Jose Ede [1 ]
Vijaykumar, Nandamudi Lankalapalli [3 ,4 ]
机构
[1] Fed Technol Univ Parana, Cornelio Procopio, PR, Brazil
[2] Fed Technol Univ Parana, Medianeira, PR, Brazil
[3] Natl Inst Space Res, Sao Jose Dos Campos, SP, Brazil
[4] Univ Fed Sao Paulo, Sao Jose Dos Campos, SP, Brazil
关键词
Systematic literature review; SLR; Automatic selection; Review update; Text classification; Document classification; Text categorization; STRATEGY;
D O I
10.1016/j.infsof.2020.106395
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Systematic Literature Reviews (SLRs) are frequently used to synthesize evidence in Software Engineering (SE), however replicating and keeping SLRs up-to-date is a major challenge. The activity of studies selection in SLR is labor intensive due to the large number of studies that must be analyzed. Different approaches have been investigated to support SLR processes, such as: Visual Text Mining or Text Classification. But acquiring the initial dataset is time-consuming and labor intensive. Objective: In this work, we proposed and evaluated the use of Text Classification to support the studies selection activity of new evidences to update SLRs in SE. Method: We applied Text Classification techniques to investigate how effective and how much effort could be spared during the studies selection phase of an SLR update. Considering the SLRs update scenario, the studies analyzed in the primary SLR could be used as a classified dataset to train Supervised Machine Learning algorithms. We conducted an experiment with 8 Software Engineering SLRs. In the experiments, we investigated the use of multiple preprocessing and feature extraction tasks such as tokenization, stop words removal, word lemmatization, TF-IDF (Term-Frequency/Inverse-Document-Frequency) with Decision Tree and Support Vector Machines as classification algorithms. Furthermore, we configured the classifier activation threshold for maximizing Recall, hence reducing the number of Missed selected studies. Results: The techniques accuracies were measured and the results achieved on average a F-Score of 0.92 and 62% of exclusion rate when varying the activation threshold of the classifiers, with a 4% average number of Missed selected studies. Both the Exclusion rate and number of Missed selected studies were significantly different when compared to classifier which did not use the configuration of the activation threshold. Conclusion: The results showed the potential of the techniques in reducing the effort required of SLRs updates.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Systematic literature reviews in software engineering
    Wohlin, Claes
    Prikladniki, Rafael
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2013, 55 (06) : 919 - 920
  • [2] Systematic literature reviews in software engineering - A systematic literature review
    Kitchenham, Barbara
    Brereton, O. Pearl
    Budgen, David
    Turner, Mark
    Bailey, John
    Linkman, Stephen
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2009, 51 (01) : 7 - 15
  • [3] On Using Grey Literature and Google Scholar in Systematic Literature Reviews in Software Engineering
    Yasin, Affan
    Fatima, Rubia
    Wen, Lijie
    Afzal, Wasif
    Azhar, Muhammad
    Torkar, Richard
    [J]. IEEE ACCESS, 2020, 8 : 36226 - 36243
  • [4] When to update systematic literature reviews in software engineering
    Mendes, Emilia
    Wohlin, Claes
    Felizardo, Katia
    Kalinowski, Marcos
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 167
  • [5] Systematic literature reviews in software engineering - A tertiary study
    Kitchenham, Barbara
    Pretorius, Rialette
    Budgen, David
    Brereton, O. Pearl
    Turner, Mark
    Niazi, Mahmood
    Linkman, Stephen
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (08) : 792 - 805
  • [6] The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature
    Garousi, Vahid
    Felderer, Michael
    Mantyla, Mika V.
    [J]. PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING 2016 (EASE '16), 2016,
  • [7] Analysing app reviews for software engineering: a systematic literature review
    Jacek Dąbrowski
    Emmanuel Letier
    Anna Perini
    Angelo Susi
    [J]. Empirical Software Engineering, 2022, 27
  • [8] Defining protocols of Systematic Literature Reviews in Software Engineering: a survey
    Felizardo, Katia Romero
    De Souza, Erica Ferreira
    Falbo, Ricardo Almeida
    Vijaykumar, Nandamudi Lankalapalli
    Mendes, Emilia
    Nakagawa, Elisa Yumi
    [J]. 2017 43RD EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA), 2017, : 202 - 209
  • [9] Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective
    Yang, Lanxin
    Zhang, He
    Shen, Haifeng
    Huang, Xin
    Zhou, Xin
    Rong, Guoping
    Shao, Dong
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 130
  • [10] A critical appraisal tool for systematic literature reviews in software engineering
    bin Ali, Nauman
    Usman, Muhammad
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 112 : 48 - 50