A Comparison of Natural Language Understanding Platforms for Chatbots in Software Engineering

被引:30
|
作者
Abdellatif, Ahmad [1 ]
Badran, Khaled [1 ]
Costa, Diego Elias [1 ]
Shihab, Emad [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Data Driven Anal Software DAS Lab, Montreal, PQ H3G 1M8, Canada
关键词
Software chatbots; natural language understanding platforms; empirical software engineering; COEFFICIENT;
D O I
10.1109/TSE.2021.3078384
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Chatbots are envisioned to dramatically change the future of Software Engineering, allowing practitioners to chat and inquire about their software projects and interact with different services using natural language. At the heart of every chatbot is a Natural Language Understanding (NLU) component that enables the chatbot to understand natural language input. Recently, many NLU platforms were provided to serve as an off-the-shelf NLU component for chatbots, however, selecting the best NLU for Software Engineering chatbots remains an open challenge. Therefore, in this paper, we evaluate four of the most commonly used NLUs, namely IBM Watson, Google Dialogflow, Rasa, and Microsoft LUIS to shed light on which NLU should be used in Software Engineering based chatbots. Specifically, we examine the NLUs' performance in classifying intents, confidence scores stability, and extracting entities. To evaluate the NLUs, we use two datasets that reflect two common tasks performed by Software Engineering practitioners, 1) the task of chatting with the chatbot to ask questions about software repositories 2) the task of asking development questions on Q&A forums (e.g., Stack Overflow). According to our findings, IBM Watson is the best performing NLU when considering the three aspects (intents classification, confidence scores, and entity extraction). However, the results from each individual aspect show that, in intents classification, IBM Watson performs the best with an Fl-measure > 84%, but in confidence scores, Rasa comes on top with a median confidence score higher than 0.91. Our results also show that all NLUs, except for Diabgflow, generally provide trustable confidence scores. For entity extraction, Microsoft LUIS and IBM Watson outperform other NLUs in the two SE tasks. Our results provide guidance to software engineering practitioners when deciding which NLU to use in their chatbots.
引用
收藏
页码:3087 / 3102
页数:16
相关论文
共 50 条
  • [1] Multi-intent Hierarchical Natural Language Understanding for Chatbots
    Rychalska, Barbara
    Glabska, Helena
    Wroblewska, Anna
    [J]. 2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 256 - 259
  • [2] Custom Natural Language Understanding for Healthcare Chatbots and A Case Study
    Inupakutika, Devasena
    Akopian, David
    Reddy, Ganesh
    Chalela, Patricia
    Kaghyan, Sahak
    Mundlamuri, Rahul
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON DIGITAL HEALTH, ICDH 2024, 2024, : 114 - 122
  • [3] A Natural Language Understanding Model COVID-19 based for chatbots
    dos Santos Junior, Valmir Oliveira
    Castelo Branco, Joao Araujo
    de Oliveira, Marcos Antonio
    Coelho da Silva, Ticiana L.
    Cruz, Livia Almada
    Magalhaes, Regis Pires
    [J]. 2021 IEEE 21ST INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (IEEE BIBE 2021), 2021,
  • [4] Effective Crowdsourced Generation of Training Data for Chatbots Natural Language Understanding
    Bapat, Rucha
    Kucherbaev, Pavel
    Bozzon, Alessandro
    [J]. WEB ENGINEERING, ICWE 2018, 2018, 10845 : 114 - 128
  • [5] Special section on natural language in software engineering
    Sawyer, Pete
    Gervasi, Vincenzo
    [J]. IET SOFTWARE, 2008, 2 (01) : 1 - 2
  • [6] Natural Language Understanding of Systems Engineering Artifacts
    Kulcsár, Géza
    Constant, Olivier
    Pruvost, Gaëtan
    Ráth, István
    Füzesi, Máté
    Harmath, Dénes
    [J]. INCOSE International Symposium, 2022, 32 (01) : 1373 - 1387
  • [7] Natural Language User Interface For Software Engineering Tasks
    Wachtel, Alexander
    Klamroth, Jonas
    Tichy, Walter F.
    [J]. ACHI 2017: THE TENTH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER-HUMAN INTERACTIONS, 2017, : 34 - 39
  • [8] Typefaces and the Perception of Humanness in Natural Language Chatbots
    Candello, Heloisa
    Pinhanez, Claudio
    Figueiredo, Flavio
    [J]. PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, : 3476 - 3487
  • [9] When Natural Language Processing Jumps into Collaborative Software Engineering
    Gilson, Fabian
    Weyns, Danny
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C 2019), 2019, : 238 - 241
  • [10] The Use of Text Retrieval and Natural Language Processing in Software Engineering
    Haiduc, Sonia
    Arnaoudova, Venera
    Marcus, Andrian
    Antoniol, Giuliano
    [J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 898 - 899