Benchmarking question answering systems

被引:16
|
作者
Usbeck, Ricardo [1 ]
Roeder, Michael [1 ]
Hoffmann, Michael [3 ]
Conrads, Felix [1 ]
Huthmann, Jonathan [3 ]
Ngonga-Ngomo, Axel-Cyrille [1 ]
Demmler, Christian [3 ]
Unger, Christina [2 ]
机构
[1] Paderborn Univ, DICE Data Sci Grp, Paderborn, Germany
[2] Univ Bielefeld, CITEC, Bielefeld, Germany
[3] Univ Leipzig, AKSW Grp, Leipzig, Germany
基金
欧盟地平线“2020”;
关键词
Factoid question answering; benchmarking; repeatable open research;
D O I
10.3233/SW-180312
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The necessity of making the Semantic Web more accessible for lay users, alongside the uptake of interactive systems and smart assistants for the Web, have spawned a new generation of RDF-based question answering systems. However, fair evaluation of these systems remains a challenge due to the different type of answers that they provide. Hence, repeating current published experiments or even benchmarking on the same datasets remains a complex and time-consuming task. We present a novel online benchmarking platform for question answering (QA) that relies on the FAIR principles to support the fine-grained evaluation of question answering systems. We detail how the platform addresses the fair benchmarking platform of question answering systems through the rewriting of URIs and URLs. In addition, we implement different evaluation metrics, measures, datasets and pre-implemented systems as well as methods to work with novel formats for interactive and non-interactive benchmarking of question answering systems. Our analysis of current frameworks shows that most of the current frameworks are tailored towards particular datasets and challenges but do not provide generic models. In addition, while most frameworks perform well in the annotation of entities and properties, the generation of SPARQL queries from annotated text remains a challenge.
引用
收藏
页码:293 / 304
页数:12
相关论文
共 50 条
  • [41] Evaluating Multilingual Question Answering Systems at CLEF
    Forner, Pamela
    Giampiccolo, Danilo
    Magnini, Bernardo
    Penas, Anselmo
    Rodrigo, Alvaro
    Sutcliffe, Richard
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2774 - 2781
  • [42] Interactive Question Answering Systems: Literature Review
    Biancofiore, Giovanni Maria
    Deldjoo, Yashar
    Di Noia, Tommaso
    Di Sciascio, Eugenio
    Narducci, Fedelucio
    ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [43] Arabic Question Answering Systems: Gap Analysis
    Biltawi, Mariam M.
    Tedmori, Sara
    Awajan, Arafat
    IEEE ACCESS, 2021, 9 : 63876 - 63904
  • [44] Development of an evaluation model for Question Answering Systems
    Ong, Chorng-Shyong
    Day, Min-Yuh
    Hsu, Wen-Lian
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 178 - +
  • [45] Exploiting Opinion Influence in Question Answering Systems
    Cercel, Dumitru-Clementin
    Onose, Cristian
    Trausan-Matu, Stefan
    Pop, Florin
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 197 - 201
  • [46] A survey of consumer health question answering systems
    Welivita, Anuradha
    Pu, Pearl
    AI MAGAZINE, 2023, 44 (04) : 482 - 507
  • [47] A Quantitative Evaluation of Natural Language Question Interpretation for Question Answering Systems
    Asakura, Takuto
    Kim, Jin-Dong
    Yamamoto, Yasunori
    Tateisi, Yuka
    Takagi, Toshihisa
    SEMANTIC TECHNOLOGY (JIST 2018), 2018, 11341 : 215 - 231
  • [48] A Hybrid Approach for Question Classification in Persian Automatic Question Answering Systems
    Sherkat, Ehsan
    Farhoodi, Mojgan
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 279 - 284
  • [49] Multilingual Question Answering Systems: Question Classification in Spanish based in Learning
    Garcia Cumbreras, Miguel Angel
    Martinez Santiago, Fernando
    Alfonso Urena Lopez, L.
    Montejo Raez, Arturo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (34):
  • [50] Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
    Deutsch, Daniel
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3759 - 3765