A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

被引:0
|
作者
Clinciu, Miruna-Adriana [1 ,2 ,3 ]
Eshghi, Arash [2 ]
Hastie, Helen [2 ]
机构
[1] Edinburgh Ctr Robot, Edinburgh, Midlothian, Scotland
[2] Heriot Watt Univ, Edinburgh, Midlothian, Scotland
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As transparency becomes key for robotics and AI, it will be necessary to evaluate the methods through which transparency is provided, including automatically generated natural language (NL) explanations. Here, we explore parallels between the generation of such explanations and the much-studied field of evaluation of Natural Language Generation (NLG). Specifically, we investigate which of the NLG evaluation measures map well to explanations. We present the ExBAN corpus: a crowd-sourced corpus of NL explanations for Bayesian Networks. We run correlations comparing human subjective ratings with NLG automatic measures. We find that embedding-based automatic NLG evaluation methods, such as BERTScore and BLEURT, have a higher correlation with human ratings, compared to word-overlap metrics, such as BLEU and ROUGE. This work has implications for Explainable AI and transparent robotic and autonomous systems.
引用
收藏
页码:2376 / 2387
页数:12
相关论文
共 50 条
  • [1] The price of debiasing automatic metrics in natural language evaluation
    Chaganty, Arun Tejasvi
    Mussmann, Stephen
    Liang, Percy
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 643 - 653
  • [2] Automatic Generation of Natural Language Explanations
    Costa, Felipe
    Ouyang, Sixun
    Dolog, Peter
    Lawlor, Aonghus
    [J]. COMPANION OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES (IUI'18), 2018,
  • [3] Natural Language Generation, its Evaluation and Metrics
    Gehrmann, Sebastian
    Adewumi, Tosin
    Aggarwal, Karmanya
    Ammanamanchi, Pawan Sasanka
    Anuoluwapo, Aremu
    Bosselut, Antoine
    Chandu, Khyathi Raghavi
    Clinciu, Miruna
    Das, Dipanjan
    Dhole, Kaustubh D.
    Du, Wanyu
    Durmus, Esin
    Gangal, Varun
    Garbacea, Cristina
    Hashimoto, Tatsunori
    Hou, Yufang
    Jernite, Yacine
    Jhamtani, Harsh
    Ji, Yangfeng
    Jolly, Shailza
    Kale, Mihir
    Kumar, Dhruv
    Ladhak, Faisal
    Madaan, Aman
    Maddela, Mounica
    Mahajan, Khyati
    Mahamood, Saad
    Majumder, Bodhisattwa Prasad
    Martins, Pedro Henrique
    McMillan-Major, Angelina
    Mille, Simon
    van Miltenburg, Emiel
    Nadeem, Moin
    Narayan, Shashi
    Nikolaev, Vitaly
    Niyongabo, Rubungo Andre
    Osei, Salomey
    Parikh, Ankur
    Perez-Beltrachini, Laura
    Rao, Niranjan Ramesh
    Raunak, Vikas
    Rodriguez, Juan Diego
    Santhanam, Sashank
    Sedoc, Joao
    Sellam, Thibault
    Shaikh, Samira
    Shimorina, Anastasia
    Sobrevilla Cabezudo, Marco Antonio
    Strobelt, Hendrik
    Subramani, Nishant
    [J]. 1ST WORKSHOP ON NATURAL LANGUAGE GENERATION, EVALUATION, AND METRICS (GEM 2021), 2021, : 96 - 120
  • [4] Are Human Explanations Always Helpful? Towards Objective Evaluation of Human Natural Language Explanations
    Yao, Bingsheng
    Sen, Prithviraj
    Popa, Lucian
    Hendler, James
    Wang, Dakuo
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14698 - 14713
  • [5] Automatic recognition and evaluation of natural language commands
    Majewski, Maciej
    Kacalak, Wojciech
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 3, PROCEEDINGS, 2006, 3973 : 1155 - 1160
  • [6] MENLI: Robust Evaluation Metrics from Natural Language Inference
    Chen, Yanran
    Eger, Steffen
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 804 - 825
  • [7] Recommendation with Dynamic Natural Language Explanations
    Li, Xi
    Zhang, Jingsen
    Bo, Xiaohe
    Wang, Lei
    Chen, Xu
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Faithfulness Tests for Natural Language Explanations
    Atanasova, Pepa
    Camburu, Oana-Maria
    Lioma, Christina
    Lukasiewicz, Thomas
    Simonsen, Jakob Grue
    Augenstein, Isabelle
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 283 - 294
  • [9] The Glass Ceiling of Automatic Evaluation in Natural Language Generation
    Colombo, Pierre
    Peyrard, Maxime
    Noiry, Nathan
    West, Robert
    Piantanida, Pablo
    [J]. 13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 178 - 183
  • [10] A survey on XAI and natural language explanations
    Cambria, Erik
    Malandri, Lorenzo
    Mercorio, Fabio
    Mezzanzanica, Mario
    Nobani, Navid
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (01)