Meta-evaluation of Conversational Search Evaluation Metrics

被引:6
|
作者
Liu, Zeyang [1 ]
Zhou, Ke [1 ,2 ]
Wilson, Max L. [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Jubilee Campus Wollaton Rd, Nottingham NG8 1BB, England
[2] Nokia Bell Labs, Broers Bldg, Cambridge CB3 0FA, England
关键词
Conversational search; meta-evaluation; metric; discriminative power;
D O I
10.1145/3445029
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conversational search systems, such as Google assistant and Microsoft Cortana, enable users to interact with search systems in multiple rounds through natural language dialogues. Evaluating such systems is very challenging, given that any natural language responses could be generated, and users commonly interact for multiple semantically coherent rounds to accomplish a search task. Although prior studies proposed many evaluation metrics, the extent of how those measures effectively capture user preference remain to be investigated. In this article, we systematically meta-evaluate a variety of conversational search metrics. We specifically study three perspectives on those metrics: (1) reliability: the ability to detect "actual" performance differences as opposed to those observed by chance; (2) fidelity: the ability to agree with ultimate user preference; and (3) intuitiveness: the ability to capture any property deemed important: adequacy, informativeness, and fluency in the context of conversational search. By conducting experiments on two test collections, we find that the performance of different metrics vary significantly across different scenarios, whereas consistent with prior studies, existing metrics only achieve weak correlation with ultimate user preference and satisfaction. METEOR is, comparatively speaking, the best existing single-turn metric considering all three perspectives. We also demonstrate that adapted session-based evaluation metrics can be used to measure multi-turn conversational search, achieving moderate concordance with user satisfaction. To our knowledge, our work establishes the most comprehensive meta-evaluation for conversational search to date.
引用
收藏
页数:42
相关论文
共 50 条
  • [41] Meta-evaluation of a whole systems programme, ActEarly: A study protocol
    Mansukoski, Liina
    Lockyer, Bridget
    Creaser, Amy
    Sheringham, Jessica
    Sheard, Laura
    Garnett, Philip
    Yang, Tiffany
    Cookson, Richard
    Albert, Alexandra
    Islam, Shahid
    Shore, Robert
    Khan, Aiysha
    Twite, Simon
    Dawson, Tania
    Iqbal, Halima
    Skarda, Ieva
    Villadsen, Aase
    Asaria, Miqdad
    West, Jane
    Sheldon, Trevor
    Wright, John
    Bryant, Maria
    [J]. PLOS ONE, 2023, 18 (06):
  • [42] A meta-evaluation, or quality assessment, of the evaluations in this issue, based on the African Evaluation Guidelines: 2002
    Patel, M
    [J]. EVALUATION AND PROGRAM PLANNING, 2002, 25 (04) : 329 - 332
  • [43] QUALITY ASSURANCE OF EDUCATION IN SURGERY .2. EVALUATION APPROACH ASSESSED BY META-EVALUATION
    EITEL, F
    PRENZEL, M
    SCHWEIBERER, L
    LYON, HC
    [J]. THEORETICAL SURGERY, 1994, 9 (01): : 1 - 9
  • [44] Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization
    Chen, Yiran
    Liu, Pengfei
    Qiu, Xipeng
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2082 - 2095
  • [45] Meta-evaluation of Machine Translation Using Parallel Legal Texts
    Wong, Billy Tak-Ming
    Kit, Chunyu
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 337 - 344
  • [46] CAUTIONS ON THE USE OF INVESTIGATIVE CASE-STUDIES IN META-EVALUATION
    SMITH, NL
    [J]. EVALUATION AND PROGRAM PLANNING, 1990, 13 (04) : 373 - 378
  • [47] Reconstructing and assessing the evaluation logic of the Dutch Closed Criminal Cases Evaluation Commission: report of a meta-evaluation
    Haarhuis, C. M. Klein
    de Jongste, W. M.
    [J]. EVIDENCE & POLICY, 2010, 6 (04): : 483 - 503
  • [48] Meta-Evaluation of health management: challenges for "new public health"
    de Araujo Hartz, Zulmira Maria
    [J]. CIENCIA & SAUDE COLETIVA, 2012, 17 (04): : 832 - U19
  • [49] META-EVALUATION: EXPERIENCES IN AN ACCELERATED GRADUATE NURSE EDUCATION PROGRAM
    Ardisson, Michelle
    Smallheer, Benjamin
    Moore, Ginny
    Christenbery, Tom
    [J]. JOURNAL OF PROFESSIONAL NURSING, 2015, 31 (06) : 508 - 515
  • [50] Meta-evaluation Analysis to "Students' Assessment" of University in Our Country
    Li Nan
    [J]. PROCEEDINGS OF 2010 INTERNATIONAL SYMPOSIUM - LABOR EMPLOYMENT AND INCOME DISTRIBUTION STUDIES, 2010, : 317 - 321