Reducing efforts of software engineering systematic literature reviews updates using text classification

被引：13

作者：

Watanabe, Willian Massami ^{[1
]}

Felizardo, Katia Romero ^{[1
]}

Candido Jr, Arnaldo ^{[2
]}

de Souza, Erica Ferreira ^{[1
]}

de Campos Neto, Jose Ede ^{[1
]}

Vijaykumar, Nandamudi Lankalapalli ^{[3
,4
]}

机构：

[1] Fed Technol Univ Parana, Cornelio Procopio, PR, Brazil

[2] Fed Technol Univ Parana, Medianeira, PR, Brazil

[3] Natl Inst Space Res, Sao Jose Dos Campos, SP, Brazil

[4] Univ Fed Sao Paulo, Sao Jose Dos Campos, SP, Brazil

来源：

INFORMATION AND SOFTWARE TECHNOLOGY | 2020年 / 128卷

关键词：

Systematic literature review; SLR; Automatic selection; Review update; Text classification; Document classification; Text categorization; STRATEGY;

D O I：

10.1016/j.infsof.2020.106395

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Context: Systematic Literature Reviews (SLRs) are frequently used to synthesize evidence in Software Engineering (SE), however replicating and keeping SLRs up-to-date is a major challenge. The activity of studies selection in SLR is labor intensive due to the large number of studies that must be analyzed. Different approaches have been investigated to support SLR processes, such as: Visual Text Mining or Text Classification. But acquiring the initial dataset is time-consuming and labor intensive. Objective: In this work, we proposed and evaluated the use of Text Classification to support the studies selection activity of new evidences to update SLRs in SE. Method: We applied Text Classification techniques to investigate how effective and how much effort could be spared during the studies selection phase of an SLR update. Considering the SLRs update scenario, the studies analyzed in the primary SLR could be used as a classified dataset to train Supervised Machine Learning algorithms. We conducted an experiment with 8 Software Engineering SLRs. In the experiments, we investigated the use of multiple preprocessing and feature extraction tasks such as tokenization, stop words removal, word lemmatization, TF-IDF (Term-Frequency/Inverse-Document-Frequency) with Decision Tree and Support Vector Machines as classification algorithms. Furthermore, we configured the classifier activation threshold for maximizing Recall, hence reducing the number of Missed selected studies. Results: The techniques accuracies were measured and the results achieved on average a F-Score of 0.92 and 62% of exclusion rate when varying the activation threshold of the classifiers, with a 4% average number of Missed selected studies. Both the Exclusion rate and number of Missed selected studies were significantly different when compared to classifier which did not use the configuration of the activation threshold. Conclusion: The results showed the potential of the techniques in reducing the effort required of SLRs updates.

引用

页数：15

共 50 条

[1] Systematic literature reviews in software engineering
Wohlin, Claes
Prikladniki, Rafael
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2013, 55 (06) : 919 - 920
[2] Systematic literature reviews in software engineering - A systematic literature review
Kitchenham, Barbara
Brereton, O. Pearl
Budgen, David
Turner, Mark
Bailey, John
Linkman, Stephen
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2009, 51 (01) : 7 - 15
[3] On Using Grey Literature and Google Scholar in Systematic Literature Reviews in Software Engineering
Yasin, Affan
Fatima, Rubia
Wen, Lijie
Afzal, Wasif
Azhar, Muhammad
Torkar, Richard
[J]. IEEE ACCESS, 2020, 8 : 36226 - 36243
[4] When to update systematic literature reviews in software engineering
Mendes, Emilia
Wohlin, Claes
Felizardo, Katia
Kalinowski, Marcos
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 167
[5] Systematic literature reviews in software engineering - A tertiary study
Kitchenham, Barbara
Pretorius, Rialette
Budgen, David
Brereton, O. Pearl
Turner, Mark
Niazi, Mahmood
Linkman, Stephen
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (08) : 792 - 805
[6] The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature
Garousi, Vahid
Felderer, Michael
Mantyla, Mika V.
[J]. PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING 2016 (EASE '16), 2016,
[7] Analysing app reviews for software engineering: a systematic literature review
Jacek Dąbrowski
Emmanuel Letier
Anna Perini
Angelo Susi
[J]. Empirical Software Engineering, 2022, 27
[8] Defining protocols of Systematic Literature Reviews in Software Engineering: a survey
Felizardo, Katia Romero
De Souza, Erica Ferreira
Falbo, Ricardo Almeida
Vijaykumar, Nandamudi Lankalapalli
Mendes, Emilia
Nakagawa, Elisa Yumi
[J]. 2017 43RD EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA), 2017, : 202 - 209
[9] Quality Assessment in Systematic Literature Reviews: A Software Engineering Perspective
Yang, Lanxin
Zhang, He
Shen, Haifeng
Huang, Xin
Zhou, Xin
Rong, Guoping
Shao, Dong
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 130
[10] A critical appraisal tool for systematic literature reviews in software engineering
bin Ali, Nauman
Usman, Muhammad
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 112 : 48 - 50

← 1 2 3 4 5 →