Evaluating Coherence in Dialogue Systems using Entailment

被引:0
|
作者
Dziri, Nouha [1 ]
Kamalloo, Ehsan [1 ]
Mathewson, Kory W. [1 ]
Zaiane, Osmar [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.
引用
收藏
页码:3806 / 3812
页数:7
相关论文
共 50 条
  • [41] Evaluating automatic dialogue strategy adaptation for a spoken dialogue system
    Chu-Carroll, J
    Nickerson, JS
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : A202 - A209
  • [42] Answer Validation Using Textual Entailment
    Pakray, Partha
    Gelbukh, Alexander
    Bandyopadhyay, Sivaji
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 353 - +
  • [43] Designing persuasive dialogue systems: Using argumentation with care
    Nguyen, Hien
    Masthoff, Judith
    PERSUASIVE TECHNOLOGY, 2008, 5033 : 201 - 212
  • [45] Evaluating spoken dialogue systems according to de-facto standards:: A case study
    Moeller, Sebastian
    Smeele, Paula
    Boland, Heleen
    Krebber, Jan
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (01): : 26 - 53
  • [46] Dynamic Coherence in the Dialogue of Subjects A study based on Bakhtin's Theory of Dialogue
    Ye, Danmin
    Wang, Dongzhu
    CHINESE SEMIOTIC STUDIES, 2020, 16 (01) : 105 - 118
  • [47] A Network Approach for Evaluating Coherence in Multivariate Systems: An Application to Psychophysiological Emotion Data
    Fushing Hsieh
    Emilio Ferrer
    Shuchun Chen
    Iris B. Mauss
    Oliver John
    James J. Gross
    Psychometrika, 2011, 76 : 124 - 152
  • [48] Dialogue Segmentation based on Dynamic Context Coherence
    Pu, Hengfeng
    Wang, Liqing
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 190 - 195
  • [49] A NETWORK APPROACH FOR EVALUATING COHERENCE IN MULTIVARIATE SYSTEMS: AN APPLICATION TO PSYCHOPHYSIOLOGICAL EMOTION DATA
    Hsieh, Fushing
    Ferrer, Emilio
    Chen, Shuchun
    Mauss, Iris B.
    John, Oliver
    Gross, James J.
    PSYCHOMETRIKA, 2011, 76 (01) : 124 - 152
  • [50] Evaluating Interreligious Dialogue in the Middle East
    Driessen, Michael Daniel
    PEACE REVIEW-A JOURNAL OF SOCIAL JUSTICE, 2020, 32 (01): : 1 - 12