SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

被引:0
|
作者
Zhao, Kun [1 ]
Yang, Bohao [2 ]
Tang, Chen [2 ]
Lin, Chenghua [2 ]
Zhan, Liang [1 ]
机构
[1] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England
基金
美国国家科学基金会;
关键词
ENERGY;
D O I
暂无
中图分类号
学科分类号
摘要
The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the oneto-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonsense reasoning biases within LLMs may hinder their performance in domainspecific evaluations. To address both issues, we propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation), that leverages both a small, specialised model (SLM), and LLMs for the evaluation of open domain dialogues. Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that combines embedding cosine distances with similarity learned through neural networks, and (3) A strategy for incorporating the evaluation results from both the SLM and LLMs. Our empirical results demonstrate that our approach achieves state-of-the-art performance in both the classification and evaluation tasks, and additionally the SLIDE evaluator exhibits better correlation with human judgements. Our code is available at https:// github.com/hegehongcha/SLIDE- ACL2024.
引用
收藏
页码:15421 / 15435
页数:15
相关论文
共 50 条
  • [1] Evaluating Open-Domain Question Answering in the Era of Large Language Models
    Kamalloo, Ehsan
    Dziri, Nouha
    Clarke, Charles L. A.
    Rafiei, Davood
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5591 - 5606
  • [2] Between reality and delusion: challenges of applying large language models to companion robots for open-domain dialogues with older adults
    Irfan, Bahar
    Kuoppamaeki, Sanna
    Hosseini, Aida
    Skantze, Gabriel
    AUTONOMOUS ROBOTS, 2025, 49 (01)
  • [3] Open-Domain Question Answering over Tables with Large Language Models
    Liang, Xinyi
    Hu, Rui
    Liu, Yu
    Zhu, Konglin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 347 - 358
  • [4] Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
    Yang, Zonglin
    Du, Xinya
    Li, Junxian
    Zheng, Jie
    Poria, Soujanya
    Cambria, Erik
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13545 - 13565
  • [5] Integrating Open-domain Knowledge via Large Language Model for Multimodal Fake News Detection
    Xie, Anbin
    Zhu, Fuqing
    Han, Jizhong
    Hu, Songlin
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1917 - 1922
  • [6] Proxy Indicators for the Quality of Open-domain Dialogues
    Nedelchev, Rostislav
    Lehmann, Jens
    Usbeck, Ricardo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7834 - 7855
  • [7] An Open-Domain Avatar Chatbot by Exploiting a Large Language Model
    Yamazaki, Takato
    Mizumoto, Tomoya
    Yoshikawa, Katsumasa
    Ohagi, Masaya
    Kawamoto, Toshiki
    Sato, Toshinori
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 428 - 432
  • [8] Implicit Discourse Relation Identification for Open-domain Dialogues
    Ma, Mingyu Derek
    Bowden, Kevin K.
    Wu, Jiaqi
    Cui, Wen
    Walker, Marilyn
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 666 - 672
  • [9] Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models
    Bae, Sanghwan
    Kwak, Donghyun
    Kim, Sungdong
    Ham, Donghoon
    Kang, Soyoung
    Lee, Sang-Woo
    Park, Woomyoung
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2128 - 2150
  • [10] RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding
    Ji, Ziwei
    Liu, Zihan
    Lee, Nayeon
    Yu, Tiezheng
    Wilie, Bryan
    Zeng, Min
    Fung, Pascale
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4504 - 4522