Evaluating Coherence in Dialogue Systems using Entailment

被引：0

作者：

Dziri, Nouha ^{[1
]}

Kamalloo, Ehsan ^{[1
]}

Mathewson, Kory W. ^{[1
]}

Zaiane, Osmar ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada

来源：

2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1 | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses.

引用

页码：3806 / 3812

页数：7

共 50 条

[21] A Reinforcement Learning approach to evaluating state representations in spoken dialogue systems
Tetreault, Joel R.
Litman, Diane J.
SPEECH COMMUNICATION, 2008, 50 (8-9) : 683 - 696
[22] Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
Sun, Weiwei
Guo, Shuyu
Zhang, Shuo
Ren, Pengjie
Chen, Zhumin
de Rijke, Maarten
Ren, Zhaochun
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
[23] Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent
Cheng, Minhao
Wei, Wei
Hsieh, Cho-Jui
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3325 - 3335
[24] Using text segmentation to improve the coherence of Chinese dialogue text summary
Dept. of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200240, China
Shanghai Jiaotong Daxue Xuebao, 2007, 8 (1292-1296+1300):
[25] Evaluating Online Dialogue on "Security" Using a Novel Instructional Design
Arora, Payal
ELECTRONIC JOURNAL OF E-LEARNING, 2008, 6 (01): : 1 - 10
[26] On local modularity and interpolation in entailment systems
Veloso, PAS
Fiadeiro, JL
Veloso, SRM
INFORMATION PROCESSING LETTERS, 2002, 82 (04) : 203 - 211
[27] The Generation of Textual Entailment with NLML in an Intelligent Dialogue System for Language Learning CSIEC
Jia, Jiyou
IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 194 - 201
[28] Recognizing entailment in intelligent tutoring systems
Nielsen, Rodney D.
Ward, Wayne
Martin, James H.
NATURAL LANGUAGE ENGINEERING, 2009, 15 : 479 - 501
[29] RELEVANT ENTAILMENT - SEMANTICS AND FORMAL SYSTEMS
AVRON, A
JOURNAL OF SYMBOLIC LOGIC, 1984, 49 (02) : 334 - 342
[30] Dialogue Coherence Assessment Without Explicit Dialogue Act Labels
Mesgar, Mohsen
Buecker, Sebastian
Gurevych, Iryna
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1439 - 1450

← 1 2 3 4 5 →