Evaluating Coherence in Dialogue Systems using Entailment

被引：0

作者：

Dziri, Nouha ^{[1
]}

Kamalloo, Ehsan ^{[1
]}

Mathewson, Kory W. ^{[1
]}

Zaiane, Osmar ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1 | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.

引用

页码：3806 / 3812

页数：7

共 50 条

[31] Probabilistic entailment in the setting of coherence: The role of quasi conjunction and inclusion relation
Gilio, Angelo
Sanfilippo, Giuseppe
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2013, 54 (04) : 513 - 525
[32] Distilling dialogues -: A method using natural dialogue corpora for dialogue systems development
Jönsson, A
Dahlbäck, N
6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 44 - 51
[33] Using dialogue acts to learn better repair strategies for Spoken Dialogue Systems
Frampton, Matthew
Lemon, Oliver
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5045 - +
[34] Coherence and flexibility in dialogue games for argumentation
Prakken, H
JOURNAL OF LOGIC AND COMPUTATION, 2005, 15 (06) : 1009 - 1040
[35] Towards Quantifiable Dialogue Coherence Evaluation
Ye, Zheng
Lu, Liucun
Huang, Lishan
Lin, Liang
Liang, Xiaodan
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2718 - 2729
[36] Evaluating and Enhancing the Robustness of Retrieval-Based Dialogue Systems with Adversarial Examples
Li, Jia
Tao, Chongyang
Peng, Nanyun
Wu, Wei
Zhao, Dongyan
Yan, Rui
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 142 - 154
[37] Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems
Wan, Yixin
Zhao, Jieyu
Chadha, Aman
Peng, Nanyun
Chang, Kai-Wei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9677 - 9705
[38] DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
Ghazarian, Sarik
Wen, Nuan
Galstyan, Aram
Peng, Nanyun
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 771 - 785
[39] Entailment systems for stably locally compact locales
Vickers, S
THEORETICAL COMPUTER SCIENCE, 2004, 316 (1-3) : 259 - 296
[40] Evaluating and identifying pearls and their nuclei by using optical coherence tomography
Ju, Myeong Jin
Lee, Sang Jin
Min, Eun Jung
Kim, Yuri
Kim, Hae Yeon
Lee, Byeong Ha
OPTICS EXPRESS, 2010, 18 (13): : 13468 - 13477

← 1 2 3 4 5 →