Evaluating the Cranfield Paradigm for Conversational Search Systems

被引:8
|
作者
Fu, Xiao [1 ]
Yilmaz, Emine [1 ,2 ]
Lipani, Aldo [1 ]
机构
[1] UCL, London, England
[2] Amazon, London, England
基金
英国工程与自然科学研究理事会;
关键词
dialogue systems; evaluation; relevance; satisfaction; GAIN-BASED EVALUATION;
D O I
10.1145/3539813.3545126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the sequential and interactive nature of conversations, the application of traditional Information Retrieval (IR) methods like the Cranfield paradigm require stronger assumptions. When building a test collection for Ad Hoc search, it is fair to assume that the relevance judgments provided by an annotator correlate well with the relevance judgments perceived by an actual user of the search engine. However, when building a test collection for conversational search, we do not know if it is fair to assume the same. In this paper, we perform a crowdsourcing study to evaluate the applicability of the Cranfield paradigm to conversational search systems. Our main aim is to understand what is the agreement in terms of user satisfaction between the users performing a search task in a conversational search system (i.e., directly assessing the system) and the users observing the search task being performed (i.e., indirectly assessing the system). The results of this study are paramount because they underpin and guide 1) the development of more realistic user models and simulators, and 2) the design of more reliable and robust evaluation measures for conversational search systems. Our results show that there is a fair agreement between direct and indirect assessments in terms of user satisfaction and that these two kinds of assessments share similar conversational patterns. Indeed, by collecting relevance assessments for each system utterance, we tested several conversational patterns that show a promising ability to predict user satisfaction.
引用
收藏
页码:196 / 201
页数:6
相关论文
共 50 条
  • [31] CONVERSATIONAL SYSTEMS
    BUTLER, D
    DATA PROCESSING, 1972, 14 (03): : 177 - &
  • [32] SEARCH FOR A PARADIGM
    WIEBE, P
    JOURNAL OF RELIGION, 1984, 64 (03): : 348 - 362
  • [33] IN SEARCH OF A PARADIGM
    HANKISS, E
    DAEDALUS, 1990, 119 (01) : 183 - 214
  • [34] The search for a new paradigm for the development of national agricultural research systems
    Byerlee, D
    WORLD DEVELOPMENT, 1998, 26 (06) : 1049 - 1055
  • [35] Embedding Search into a Conversational Platform to Support Collaborative Search
    Avula, Sandeep
    Arguello, Jaime
    Capra, Robert
    Dodson, Jordan
    Huang, Yuhui
    Radlinski, Filip
    PROCEEDINGS OF THE 2019 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL (CHIIR'19), 2019, : 15 - 23
  • [36] The importance of search strategies in evaluating the usability of information systems
    Jongman, GMG
    Sikken, JA
    COGNITIVE ERGONOMICS, CLINICAL ASSESSMENT AND COMPUTER-ASSISTED LEARNING, 1999, 6 : 3 - 14
  • [37] A Model of Exploratory Search for Evaluating its Systems & Applications
    Palagi, Emilie
    Troncy, Raphael
    Giboin, Alain
    Gandon, Fabien
    ACTES DE LA 30 CONFERENCE FRANCOPHONE SUR L'INTERACTION HOMME-MACHINE - (IHM 2018), 2018, : 234 - 240
  • [38] Reflections on five years of evaluating semantic search systems
    Uren V.
    Sabou M.
    Motta E.
    Fernandez M.
    Lopez V.
    Lei Y.
    International Journal of Metadata, Semantics and Ontologies, 2010, 5 (02) : 97 - 98
  • [39] Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems
    Yang, Dayu
    Chen, Fumian
    Fang, Hui
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2286 - 2290
  • [40] Caching Historical Embeddings in Conversational Search
    Frieder, Ophir
    Mele, Ida
    Muntean, Cristina Ioana
    Nardini, Franco Maria
    Perego, Raffaele
    Tonellotto, Nicola
    ACM TRANSACTIONS ON THE WEB, 2024, 18 (04)