Evaluating Coherence in Dialogue Systems using Entailment

被引:0
|
作者
Dziri, Nouha [1 ]
Kamalloo, Ehsan [1 ]
Mathewson, Kory W. [1 ]
Zaiane, Osmar [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.
引用
收藏
页码:3806 / 3812
页数:7
相关论文
共 50 条
  • [1] EVALUATING DIALOGUE STRATEGIES IN MULTIMODAL DIALOGUE SYSTEMS
    Whittaker, Steve
    Walker, Marilyn
    SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS, 2005, 28 : 247 - 268
  • [2] Coherence and transitivity of subtyping as entailment
    Longo, G
    Milsted, K
    Soloviev, S
    JOURNAL OF LOGIC AND COMPUTATION, 2000, 10 (04) : 493 - 526
  • [3] GRADE Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
    Huang, Lishan
    Ye, Zheng
    Qin, Jinghui
    Lin, Liang
    Liang, Xiaodan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9230 - 9240
  • [4] Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
    Dziri, Nouha
    Rashkin, Hannah
    Linzen, Tal
    Reitter, David
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1066 - 1083
  • [5] Evaluating Task-oriented Dialogue Systems with Users
    Siro, Clemencia
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3495 - 3495
  • [6] Evaluating Dialogue Generation Systems via Response Selection
    Sato, Shiki
    Akama, Reina
    Ouchi, Hiroki
    Suzuki, Jun
    Inui, Kentaro
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 593 - 599
  • [7] Evaluating Coherence in Open Domain Conversational Systems
    Higashinaka, Ryuichiro
    Meguro, Toyomi
    Imamura, Kenji
    Sugiyama, Hiroaki
    Makino, Toshiro
    Matsuo, Yoshihiro
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 130 - 134
  • [8] Entailment, assertion, and textual coherence: the case of almost and barely
    Amaral, Patricia
    LINGUISTICS, 2010, 48 (03) : 525 - 545
  • [9] Coherence Models for Dialogue
    Cervone, Alessandra
    Stepanov, Evgeny A.
    Riccardi, Giuseppe
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1011 - 1015
  • [10] Evaluating Paraphrastic Robustness in Textual Entailment Models
    Verma, Dhruv
    Lal, Yash Kumar
    Sinha, Shreyashee
    Van Durme, Benjamin
    Poliak, Adam
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 880 - 892