Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

被引:0
|
作者
Deriu, Jan [1 ]
Tuggener, Don [1 ]
von Daeniken, Pius [1 ]
Cieliebak, Mark [1 ]
机构
[1] Zurich Univ Appl Sci ZHAW, Winterthur, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems. The method leverages Reinforcement Learning to find response strategies that elicit optimal scores from the trained metrics. We apply our method to test recently proposed trained metrics. We find that they all are susceptible to giving high scores to responses generated by relatively simple and obviously flawed strategies that our method converges on. For instance, simply copying parts of the conversation context to form a response yields competitive scores or even outperforms responses written by humans.
引用
收藏
页码:750 / 761
页数:12
相关论文
共 50 条
  • [11] Vision Powered Conversational AI for Easy Human Dialogue Systems
    Basnyat, Bipendra
    Singh, Neha
    Roy, Nirmalya
    Gangopadhyay, Aryya
    2020 IEEE 17TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SMART SYSTEMS (MASS 2020), 2020, : 684 - 692
  • [12] A Comparison of Robustness Metrics for Scheduling DAGs on Heterogeneous Systems
    Canon, Louis-Claude
    Jeannot, Emmanuel
    2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 558 - 567
  • [13] Evaluation of Robustness Metrics for Defense of Machine Learning Systems
    DeMarchi, J.
    Rijken, R.
    Melrose, J.
    Madahar, B.
    Fumera, G.
    Roli, F.
    Ledda, E.
    Aktas, M.
    Kurth, F.
    Baggenstoss, P.
    Pelzer, B.
    Kanestad, L.
    2023 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS, ICMCIS, 2023,
  • [14] Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems
    Deriul, Jan
    Tuggenerl, Don
    Von Danikenl, Pius
    Campos, Jon Ander
    Rodrigo, Alvaro
    Belkacem, Thiziri
    Soroa, Aitor
    Agirre, Eneko
    Cieliebakl, Mark
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3971 - 3984
  • [15] The Role of Dialogue User Data in the Information Interaction Design of Conversational Systems
    Candello, Heloisa
    Pinhanez, Claudio
    DESIGN, USER EXPERIENCE, AND USABILITY: DESIGNING INTERACTIONS, DUXU 2018, PT II, 2018, 10919 : 414 - 426
  • [16] A RERANKING APPROACH FOR RECOGNITION AND CLASSIFICATION OF SPEECH INPUT IN CONVERSATIONAL DIALOGUE SYSTEMS
    Morbini, Fabrizio
    Audhkhasi, Kartik
    Artstein, Ron
    Van Segbroeck, Maarten
    Sagae, Kenji
    Georgiou, Panayiotis
    Traum, David R.
    Narayanan, Shri
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 49 - 54
  • [17] Monologue and dialogue conversational styles
    Burger, H
    ZEITSCHRIFT FUR DIALEKTOLOGIE UND LINGUISTIK, 1999, 66 (01): : 69 - 70
  • [18] Conversational evidence in therapeutic dialogue
    Strong, Tom
    Busch, Robbie
    Couture, Shari
    JOURNAL OF MARITAL AND FAMILY THERAPY, 2008, 34 (03) : 388 - 405
  • [19] Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems
    Lopez-Cozar, Ramon
    Callejas, Zoraida
    Griol, David
    KNOWLEDGE-BASED SYSTEMS, 2010, 23 (05) : 471 - 485
  • [20] Sustainability, robustness, and resilience metrics for water and other infrastructure systems
    Huizar, Luis H.
    Lansey, Kevin E.
    Arnold, Robert G.
    SUSTAINABLE AND RESILIENT INFRASTRUCTURE, 2018, 3 (01) : 16 - 35