Interactive probes: Towards action-level evaluation for dialogue systems

被引:3
|
作者
Liesenfeld, Andreas [1 ]
Dingemanse, Mark [2 ]
机构
[1] Radboud Univ Nijmegen, Ctr Language Studies, Erasmuspl 1, NL-6525 HT Nijmegen, Netherlands
[2] Radboud Univ Nijmegen, Nijmegen, Netherlands
关键词
Applied conversation analysis; conversational user interfaces; dialogue systems; usability testing; REPAIR;
D O I
10.1177/17504813241267071
中图分类号
G2 [信息与知识传播];
学科分类号
05 ; 0503 ;
摘要
Measures of 'humanness', 'coherence' or 'fluency' are the mainstay of dialogue system evaluation, but they don't target system capabilities and rarely offer actionable feedback. Reviewing recent work in this domain, we identify an opportunity for evaluation at the level of action sequences, rather than the more commonly targeted levels of whole conversations or single responses. We introduce interactive probes, an evaluation framework inspired by empirical work on social interaction that can help to systematically probe the capabilities of dialogue systems. We sketch some first probes in the domains of tellings and repair, two sequence types ubiquitous in human interaction and challenging for dialogue systems. We argue interactive probing can offer the requisite flexibility to keep up with developments in interactive language technologies and do justice to the open-endedness of action formation and ascription in interaction.
引用
收藏
页码:954 / 964
页数:11
相关论文
共 50 条
  • [31] Towards a Coordination Model for Interactive Systems
    Barbosa, Marco Antonio
    Barbosa, Luis Soares
    Campos, Jose Creissac
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2007, 183 : 89 - 103
  • [32] Towards manipulability of interactive Lagrangian systems
    Wang, Hanlei
    AUTOMATICA, 2020, 119
  • [33] Towards a Formal Representation of Interactive Systems
    Banu-Demergian, Iulia Teodora
    Stefanescu, Gheorghe
    FUNDAMENTA INFORMATICAE, 2014, 131 (3-4) : 313 - 336
  • [34] TOWARDS MORE NATURAL INTERACTIVE SYSTEMS
    FITTER, M
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1979, 11 (03): : 339 - 350
  • [35] Predicting microcystin concentration action-level exceedances resulting from cyanobacterial blooms in selected lake sites in Ohio
    Donna S. Francy
    Amie M.G. Brady
    Erin A. Stelzer
    Jessica R. Cicale
    Courtney Hackney
    Harrison D. Dalby
    Pamela Struffolino
    Daryl F. Dwyer
    Environmental Monitoring and Assessment, 2020, 192
  • [36] Interactive question answering and constraint relaxation in spoken dialogue systems
    Varges, S.
    Weng, F.
    Pon-Barry, H.
    NATURAL LANGUAGE ENGINEERING, 2009, 15 : 9 - 30
  • [37] Schools and critical public health: towards dialogue, collaboration and action
    Gard, Michael
    Wright, Jan
    CRITICAL PUBLIC HEALTH, 2014, 24 (02) : 109 - 114
  • [38] Automating Human Evaluation of Dialogue Systems
    Reddy, Sujan
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 229 - 234
  • [39] Survey on evaluation methods for dialogue systems
    Jan Deriu
    Alvaro Rodrigo
    Arantxa Otegi
    Guillermo Echegoyen
    Sophie Rosset
    Eneko Agirre
    Mark Cieliebak
    Artificial Intelligence Review, 2021, 54 : 755 - 810
  • [40] Survey on evaluation methods for dialogue systems
    Deriu, Jan
    Rodrigo, Alvaro
    Otegi, Arantxa
    Echegoyen, Guillermo
    Rosset, Sophie
    Agirre, Eneko
    Cieliebak, Mark
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (01) : 755 - 810