Evaluating Coherence in Dialogue Systems using Entailment

被引:0
|
作者
Dziri, Nouha [1 ]
Kamalloo, Ehsan [1 ]
Mathewson, Kory W. [1 ]
Zaiane, Osmar [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.
引用
收藏
页码:3806 / 3812
页数:7
相关论文
共 50 条
  • [31] Probabilistic entailment in the setting of coherence: The role of quasi conjunction and inclusion relation
    Gilio, Angelo
    Sanfilippo, Giuseppe
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (04) : 513 - 525
  • [32] Distilling dialogues -: A method using natural dialogue corpora for dialogue systems development
    Jönsson, A
    Dahlbäck, N
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 44 - 51
  • [33] Using dialogue acts to learn better repair strategies for Spoken Dialogue Systems
    Frampton, Matthew
    Lemon, Oliver
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5045 - +
  • [34] Coherence and flexibility in dialogue games for argumentation
    Prakken, H
    JOURNAL OF LOGIC AND COMPUTATION, 2005, 15 (06) : 1009 - 1040
  • [35] Towards Quantifiable Dialogue Coherence Evaluation
    Ye, Zheng
    Lu, Liucun
    Huang, Lishan
    Lin, Liang
    Liang, Xiaodan
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2718 - 2729
  • [36] Evaluating and Enhancing the Robustness of Retrieval-Based Dialogue Systems with Adversarial Examples
    Li, Jia
    Tao, Chongyang
    Peng, Nanyun
    Wu, Wei
    Zhao, Dongyan
    Yan, Rui
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 142 - 154
  • [37] Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems
    Wan, Yixin
    Zhao, Jieyu
    Chadha, Aman
    Peng, Nanyun
    Chang, Kai-Wei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9677 - 9705
  • [38] DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
    Ghazarian, Sarik
    Wen, Nuan
    Galstyan, Aram
    Peng, Nanyun
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 771 - 785
  • [39] Entailment systems for stably locally compact locales
    Vickers, S
    THEORETICAL COMPUTER SCIENCE, 2004, 316 (1-3) : 259 - 296
  • [40] Evaluating and identifying pearls and their nuclei by using optical coherence tomography
    Ju, Myeong Jin
    Lee, Sang Jin
    Min, Eun Jung
    Kim, Yuri
    Kim, Hae Yeon
    Lee, Byeong Ha
    OPTICS EXPRESS, 2010, 18 (13): : 13468 - 13477