Evaluating Coherence in Dialogue Systems using Entailment

被引：0

作者：

Dziri, Nouha ^{[1
]}

Kamalloo, Ehsan ^{[1
]}

Mathewson, Kory W. ^{[1
]}

Zaiane, Osmar ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1 | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.

引用

页码：3806 / 3812

页数：7

共 50 条

[1] EVALUATING DIALOGUE STRATEGIES IN MULTIMODAL DIALOGUE SYSTEMS
Whittaker, Steve
Walker, Marilyn
SPOKEN MULTIMODAL HUMAN-COMPUTER DIALOGUE IN MOBILE ENVIRONMENTS, 2005, 28 : 247 - 268
[2] Coherence and transitivity of subtyping as entailment
Longo, G
Milsted, K
Soloviev, S
JOURNAL OF LOGIC AND COMPUTATION, 2000, 10 (04) : 493 - 526
[3] GRADE Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Huang, Lishan
Ye, Zheng
Qin, Jinghui
Lin, Liang
Liang, Xiaodan
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9230 - 9240
[4] Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Dziri, Nouha
Rashkin, Hannah
Linzen, Tal
Reitter, David
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1066 - 1083
[5] Evaluating Task-oriented Dialogue Systems with Users
Siro, Clemencia
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3495 - 3495
[6] Evaluating Dialogue Generation Systems via Response Selection
Sato, Shiki
Akama, Reina
Ouchi, Hiroki
Suzuki, Jun
Inui, Kentaro
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 593 - 599
[7] Evaluating Coherence in Open Domain Conversational Systems
Higashinaka, Ryuichiro
Meguro, Toyomi
Imamura, Kenji
Sugiyama, Hiroaki
Makino, Toshiro
Matsuo, Yoshihiro
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 130 - 134
[8] Entailment, assertion, and textual coherence: the case of almost and barely
Amaral, Patricia
LINGUISTICS, 2010, 48 (03) : 525 - 545
[9] Coherence Models for Dialogue
Cervone, Alessandra
Stepanov, Evgeny A.
Riccardi, Giuseppe
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1011 - 1015
[10] Evaluating Paraphrastic Robustness in Textual Entailment Models
Verma, Dhruv
Lal, Yash Kumar
Sinha, Shreyashee
Van Durme, Benjamin
Poliak, Adam
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 880 - 892

← 1 2 3 4 5 →