Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

被引:0
|
作者
Deutsch, Daniel [1 ]
Roth, Dan [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method - or using none at all - has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.(1)
引用
收藏
页码:3759 / 3765
页数:7
相关论文
共 50 条
  • [41] Log-Based Evaluation Resources for Question Answering
    Mandl, Thomas
    Schulz, Julia Maria
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : D50 - D52
  • [42] Methods of Food Safety Question Answering System Based on LSTM
    Chen, Ying
    Chen, Angxuan
    Dong, Yubo
    Zhao, Xiaoyu
    Hou, Wenjun
    [J]. Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 : 380 - 384
  • [43] Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering
    Godin, Frederic
    Kumar, Anjishnu
    Mittal, Arpit
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES(NAACL HLT 2019), VOL. 2 (INDUSTRY PAPERS), 2019, : 122 - 129
  • [44] Attention-based encoder-decoder model for answer selection in question answering
    Yuan-ping Nie
    Yi Han
    Jiu-ming Huang
    Bo Jiao
    Ai-ping Li
    [J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18 : 535 - 544
  • [45] Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement
    Jeong, Soyeong
    Baek, Jinheon
    Hwang, Sung Ju
    Park, Jong C.
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 477 - 490
  • [46] Implicit relation-based question answering to answer simple questions over DBpedia
    Jamehshourani, Maryam
    Fatemi, Afsaneh
    Nematbakhsh, MohammadAli
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (03) : 1474 - 1490
  • [47] Attention-based encoder-decoder model for answer selection in question answering
    Nie, Yuan-ping
    Han, Yi
    Huang, Jiu-ming
    Jiao, Bo
    Li, Ai-ping
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2017, 18 (04) : 535 - 544
  • [48] Answer Acquisition for Knowledge Base Question Answering Systems Based on Dynamic Memory Network
    Su, Lei
    He, Ting
    Fan, Zhengyu
    Zhang, Yin
    Guizani, Mohsen
    [J]. IEEE ACCESS, 2019, 7 : 161329 - 161339
  • [49] Accurate and prompt answering framework based on customer reviews and question-answer pairs
    Kim, Eun
    Yoon, Hyejung
    Lee, Jungeun
    Kim, Misuk
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [50] Question answering system using Q & A site corpus Query expansion and answer candidate evaluation
    Komiya, Kanako
    Abe, Yuji
    Morita, Hajime
    Kotani, Yoshiyuki
    [J]. SPRINGERPLUS, 2013, 2