A Comparison of Natural Language Understanding Platforms for Chatbots in Software Engineering

被引：30

作者：

Abdellatif, Ahmad ^{[1
]}

Badran, Khaled ^{[1
]}

Costa, Diego Elias ^{[1
]}

Shihab, Emad ^{[1
]}

机构：

[1] Concordia Univ, Dept Comp Sci & Software Engn, Data Driven Anal Software DAS Lab, Montreal, PQ H3G 1M8, Canada

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2022年 / 48卷 / 08期

关键词：

Software chatbots; natural language understanding platforms; empirical software engineering; COEFFICIENT;

D O I：

10.1109/TSE.2021.3078384

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Chatbots are envisioned to dramatically change the future of Software Engineering, allowing practitioners to chat and inquire about their software projects and interact with different services using natural language. At the heart of every chatbot is a Natural Language Understanding (NLU) component that enables the chatbot to understand natural language input. Recently, many NLU platforms were provided to serve as an off-the-shelf NLU component for chatbots, however, selecting the best NLU for Software Engineering chatbots remains an open challenge. Therefore, in this paper, we evaluate four of the most commonly used NLUs, namely IBM Watson, Google Dialogflow, Rasa, and Microsoft LUIS to shed light on which NLU should be used in Software Engineering based chatbots. Specifically, we examine the NLUs' performance in classifying intents, confidence scores stability, and extracting entities. To evaluate the NLUs, we use two datasets that reflect two common tasks performed by Software Engineering practitioners, 1) the task of chatting with the chatbot to ask questions about software repositories 2) the task of asking development questions on Q&A forums (e.g., Stack Overflow). According to our findings, IBM Watson is the best performing NLU when considering the three aspects (intents classification, confidence scores, and entity extraction). However, the results from each individual aspect show that, in intents classification, IBM Watson performs the best with an Fl-measure > 84%, but in confidence scores, Rasa comes on top with a median confidence score higher than 0.91. Our results also show that all NLUs, except for Diabgflow, generally provide trustable confidence scores. For entity extraction, Microsoft LUIS and IBM Watson outperform other NLUs in the two SE tasks. Our results provide guidance to software engineering practitioners when deciding which NLU to use in their chatbots.

引用

页码：3087 / 3102

页数：16

共 50 条

[1] Multi-intent Hierarchical Natural Language Understanding for Chatbots
Rychalska, Barbara
Glabska, Helena
Wroblewska, Anna
[J]. 2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 256 - 259
[2] Custom Natural Language Understanding for Healthcare Chatbots and A Case Study
Inupakutika, Devasena
Akopian, David
Reddy, Ganesh
Chalela, Patricia
Kaghyan, Sahak
Mundlamuri, Rahul
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON DIGITAL HEALTH, ICDH 2024, 2024, : 114 - 122
[3] A Natural Language Understanding Model COVID-19 based for chatbots
dos Santos Junior, Valmir Oliveira
Castelo Branco, Joao Araujo
de Oliveira, Marcos Antonio
Coelho da Silva, Ticiana L.
Cruz, Livia Almada
Magalhaes, Regis Pires
[J]. 2021 IEEE 21ST INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (IEEE BIBE 2021), 2021,
[4] Effective Crowdsourced Generation of Training Data for Chatbots Natural Language Understanding
Bapat, Rucha
Kucherbaev, Pavel
Bozzon, Alessandro
[J]. WEB ENGINEERING, ICWE 2018, 2018, 10845 : 114 - 128
[5] Special section on natural language in software engineering
Sawyer, Pete
Gervasi, Vincenzo
[J]. IET SOFTWARE, 2008, 2 (01) : 1 - 2
[6] Natural Language Understanding of Systems Engineering Artifacts
Kulcsár, Géza
Constant, Olivier
Pruvost, Gaëtan
Ráth, István
Füzesi, Máté
Harmath, Dénes
[J]. INCOSE International Symposium, 2022, 32 (01) : 1373 - 1387
[7] Natural Language User Interface For Software Engineering Tasks
Wachtel, Alexander
Klamroth, Jonas
Tichy, Walter F.
[J]. ACHI 2017: THE TENTH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER-HUMAN INTERACTIONS, 2017, : 34 - 39
[8] Typefaces and the Perception of Humanness in Natural Language Chatbots
Candello, Heloisa
Pinhanez, Claudio
Figueiredo, Flavio
[J]. PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, : 3476 - 3487
[9] When Natural Language Processing Jumps into Collaborative Software Engineering
Gilson, Fabian
Weyns, Danny
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C 2019), 2019, : 238 - 241
[10] The Use of Text Retrieval and Natural Language Processing in Software Engineering
Haiduc, Sonia
Arnaoudova, Venera
Marcus, Andrian
Antoniol, Giuliano
[J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 898 - 899

← 1 2 3 4 5 →