Evaluating Coherence in Dialogue Systems using Entailment

被引:0
|
作者
Dziri, Nouha [1 ]
Kamalloo, Ehsan [1 ]
Mathewson, Kory W. [1 ]
Zaiane, Osmar [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
来源
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1 | 2019年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.
引用
收藏
页码:3806 / 3812
页数:7
相关论文
共 50 条
  • [21] A Reinforcement Learning approach to evaluating state representations in spoken dialogue systems
    Tetreault, Joel R.
    Litman, Diane J.
    SPEECH COMMUNICATION, 2008, 50 (8-9) : 683 - 696
  • [22] Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
    Sun, Weiwei
    Guo, Shuyu
    Zhang, Shuo
    Ren, Pengjie
    Chen, Zhumin
    de Rijke, Maarten
    Ren, Zhaochun
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
  • [23] Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent
    Cheng, Minhao
    Wei, Wei
    Hsieh, Cho-Jui
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3325 - 3335
  • [24] Using text segmentation to improve the coherence of Chinese dialogue text summary
    Dept. of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200240, China
    Shanghai Jiaotong Daxue Xuebao, 2007, 8 (1292-1296+1300):
  • [25] Evaluating Online Dialogue on "Security" Using a Novel Instructional Design
    Arora, Payal
    ELECTRONIC JOURNAL OF E-LEARNING, 2008, 6 (01): : 1 - 10
  • [26] On local modularity and interpolation in entailment systems
    Veloso, PAS
    Fiadeiro, JL
    Veloso, SRM
    INFORMATION PROCESSING LETTERS, 2002, 82 (04) : 203 - 211
  • [27] The Generation of Textual Entailment with NLML in an Intelligent Dialogue System for Language Learning CSIEC
    Jia, Jiyou
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 194 - 201
  • [28] Recognizing entailment in intelligent tutoring systems
    Nielsen, Rodney D.
    Ward, Wayne
    Martin, James H.
    NATURAL LANGUAGE ENGINEERING, 2009, 15 : 479 - 501
  • [29] RELEVANT ENTAILMENT - SEMANTICS AND FORMAL SYSTEMS
    AVRON, A
    JOURNAL OF SYMBOLIC LOGIC, 1984, 49 (02) : 334 - 342
  • [30] Dialogue Coherence Assessment Without Explicit Dialogue Act Labels
    Mesgar, Mohsen
    Buecker, Sebastian
    Gurevych, Iryna
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1439 - 1450