Evaluating the Cranfield Paradigm for Conversational Search Systems

被引:8
|
作者
Fu, Xiao [1 ]
Yilmaz, Emine [1 ,2 ]
Lipani, Aldo [1 ]
机构
[1] UCL, London, England
[2] Amazon, London, England
基金
英国工程与自然科学研究理事会;
关键词
dialogue systems; evaluation; relevance; satisfaction; GAIN-BASED EVALUATION;
D O I
10.1145/3539813.3545126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the sequential and interactive nature of conversations, the application of traditional Information Retrieval (IR) methods like the Cranfield paradigm require stronger assumptions. When building a test collection for Ad Hoc search, it is fair to assume that the relevance judgments provided by an annotator correlate well with the relevance judgments perceived by an actual user of the search engine. However, when building a test collection for conversational search, we do not know if it is fair to assume the same. In this paper, we perform a crowdsourcing study to evaluate the applicability of the Cranfield paradigm to conversational search systems. Our main aim is to understand what is the agreement in terms of user satisfaction between the users performing a search task in a conversational search system (i.e., directly assessing the system) and the users observing the search task being performed (i.e., indirectly assessing the system). The results of this study are paramount because they underpin and guide 1) the development of more realistic user models and simulators, and 2) the design of more reliable and robust evaluation measures for conversational search systems. Our results show that there is a fair agreement between direct and indirect assessments in terms of user satisfaction and that these two kinds of assessments share similar conversational patterns. Indeed, by collecting relevance assessments for each system utterance, we tested several conversational patterns that show a promising ability to predict user satisfaction.
引用
收藏
页码:196 / 201
页数:6
相关论文
共 50 条
  • [1] How Am I Doing?: Evaluating Conversational Search Systems Offline
    Lipani, Aldo
    Carterette, Ben
    Yilmaz, Emine
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2021, 39 (04)
  • [2] Is the Cranfield Paradigm Outdated?
    Harman, Donna
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 1 - 1
  • [3] Evaluating Mixed-initiative Conversational Search Systems via User Simulation
    Sekulic, Ivan
    Aliannejadi, Mohammad
    Crestani, Fabio
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 888 - 896
  • [5] Evaluating Coherence in Open Domain Conversational Systems
    Higashinaka, Ryuichiro
    Meguro, Toyomi
    Imamura, Kenji
    Sugiyama, Hiroaki
    Makino, Toshiro
    Matsuo, Yoshihiro
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 130 - 134
  • [6] Generating Clarifying Questions in Conversational Search Systems
    Tavakoli, Leila
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3253 - 3256
  • [7] An Analysis of Stopping Strategies in Conversational Search Systems
    Fu, Xiao
    Perez-Ortiz, Maria
    Lipani, Aldo
    PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 247 - 257
  • [8] Priming and Actions: An Analysis in Conversational Search Systems
    Fu, Xiao
    Lipani, Aldo
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2277 - 2281
  • [9] Towards a Method For Evaluating Naturalness in Conversational Dialog Systems
    Hung, Victor
    Elvir, Miguel
    Gonzalez, Avelino
    DeMara, Ronald
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1236 - 1241
  • [10] Evaluating Conversational Recommender Systems via User Simulation
    Zhang, Shuo
    Balog, Krisztian
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1512 - 1520