SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

被引:0
|
作者
Zhao, Kun [1 ]
Yang, Bohao [2 ]
Tang, Chen [2 ]
Lin, Chenghua [2 ]
Zhan, Liang [1 ]
机构
[1] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Univ Manchester, Dept Comp Sci, Manchester, Lancs, England
基金
美国国家科学基金会;
关键词
ENERGY;
D O I
暂无
中图分类号
学科分类号
摘要
The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the oneto-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonsense reasoning biases within LLMs may hinder their performance in domainspecific evaluations. To address both issues, we propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation), that leverages both a small, specialised model (SLM), and LLMs for the evaluation of open domain dialogues. Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that combines embedding cosine distances with similarity learned through neural networks, and (3) A strategy for incorporating the evaluation results from both the SLM and LLMs. Our empirical results demonstrate that our approach achieves state-of-the-art performance in both the classification and evaluation tasks, and additionally the SLIDE evaluator exhibits better correlation with human judgements. Our code is available at https:// github.com/hegehongcha/SLIDE- ACL2024.
引用
收藏
页码:15421 / 15435
页数:15
相关论文
共 50 条
  • [21] DISTRIBUTED OPEN-DOMAIN CONVERSATIONAL UNDERSTANDING FRAMEWORK WITH DOMAIN INDEPENDENT EXTRACTORS
    Li, Qi
    Tur, Gokhan
    Hakkani-Tur, Dilek
    Li, Xiang
    Paek, Tim
    Gunawardana, Asela
    Quirk, Chris
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 566 - 571
  • [22] SHONGLAP: A Large Bengali Open-Domain Dialogue Corpus
    Monsur, Syed Mostofa
    Chowdhury, Sakib
    Fatemi, Md Shahrar
    Ahmed, Shafayat
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5797 - 5804
  • [23] Towards a small language model powered chain-of-reasoning for open-domain question answering
    Roh, Jihyeon
    Kim, Minho
    Bae, Kyoungman
    ETRI JOURNAL, 2024, 46 (01) : 11 - 21
  • [24] Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain Conversation
    Kim, Beomsu
    Seo, Seokjun
    Han, Seungju
    Erdenee, Enkhbayar
    Chang, Buru
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3357 - 3373
  • [25] Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues
    Deng, Wentao
    Pei, Jiahuan
    Ren, Zhaochun
    Chen, Zhumin
    Ren, Pengjie
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1232 - 1249
  • [26] Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information
    Zhao, Kun
    Yang, Bohao
    Lin, Chenghua
    Rong, Wenge
    Villavicencio, Aline
    Cui, Xiaohui
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 562 - 574
  • [27] Learning Strategies for Open-Domain Natural Language Question Answering
    Grois, Eugene
    Wilkins, David C.
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1054 - 1060
  • [28] Personality prediction from task-oriented and open-domain human–machine dialogues
    Ao Guo
    Ryu Hirai
    Atsumoto Ohashi
    Yuya Chiba
    Yuiko Tsunomori
    Ryuichiro Higashinaka
    Scientific Reports, 14
  • [29] Enhancing the Open-Domain Dialogue Evaluation in Latent Space
    Chan, Zhangming
    Liu, Lemao
    Li, Juntao
    Zhang, Haisong
    Zhao, Dongyan
    Shi, Shuming
    Yan, Rui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4889 - 4900
  • [30] Text is NOT Enough: Integrating Visual Impressions into Open-domain Dialogue Generation
    Shen, Lei
    Zhan, Haolan
    Shen, Xin
    Song, Yonghao
    Zhao, Xiaofang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4287 - 4296