Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

被引:0
|
作者
Deutsch, Daniel [1 ]
Roth, Dan [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method - or using none at all - has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.(1)
引用
收藏
页码:3759 / 3765
页数:7
相关论文
共 50 条
  • [1] Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
    Deutsch, Daniel
    Roth, Dan
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 575 - 588
  • [2] A survey of methods, datasets and evaluation metrics for visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    [J]. IMAGE AND VISION COMPUTING, 2021, 116
  • [3] Joint Models for Answer Verification in Question Answering Systems
    Zhang, Zeyu
    Vu, Thuy
    Moschitti, Alessandro
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3252 - 3262
  • [4] Verification of the Expected Answer Type for Biomedical Question Answering
    Kamath, Sanjay
    Grau, Brigitte
    Ma, Yue
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1093 - 1097
  • [5] Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering
    Demner-Fushman, Dina
    Lin, Jimmy
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 841 - 848
  • [6] Metadata-Aware Measures for Answer Summarization in Community Question Answering
    Tomasoni, Mattia
    Huang, Minlie
    [J]. ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 760 - 769
  • [7] Question Answering Based on Answer Trustworthiness
    Oh, Hyo-Jung
    Lee, Chung-Hee
    Yoon, Yeo-Chan
    Jang, Myung-Gil
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 310 - 317
  • [8] Question answering model based on machine reading comprehension with knowledge enhancement and answer verification
    Yang, Ziming
    Sun, Yuxia
    Kuang, Qingxuan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (12):
  • [9] Answer Generating Methods for Community Question and Answering Portals
    Tao, Haoxiong
    Hao, Yu
    Zhu, Xiaoyan
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 249 - 259
  • [10] A Question Answering-Based Framework for One-Step Event Argument Extraction
    Zhang, Yunyan
    Xu, Guangluan
    Wang, Yang
    Lin, Daoyu
    Li, Feng
    Wu, Chenglong
    Zhang, Jingyuan
    Huang, Tinglei
    [J]. IEEE ACCESS, 2020, 8 : 65420 - 65431