SESAME - self-supervised framework for extractive question answering over document collections

被引：0

作者：

Batista, Vitor A. ^{[1
,2
]}

Gomes, Diogo S. M. ^{[2
]}

Evsukoff, Alexandre ^{[1
]}

机构：

[1] Fed Univ Rio Janeiro, PEC Coppe, POB 68506, BR-21941972 Rio De Janeiro, RJ, Brazil

[2] PETROBRAS SA, Rua Gen Canabarro, 500, Rio De Janeiro, RJ, Brazil

来源：

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS | 2024年

关键词：

Question answering; NLP; Neural networks; Transformers; LLM;

D O I：

10.1007/s10844-024-00869-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Question Answering is one of the most relevant areas in the field of Natural Language Processing, rapidly evolving with promising results due to the increasing availability of suitable datasets and the advent of new technologies, such as Generative Models. This article introduces SESAME, a Self-supervised framework for Extractive queStion Answering over docuMent collEctions. SESAME aims to enhance open-domain question answering systems (ODQA) by leveraging domain adaptation with synthetic datasets, enabling efficient question answering over private document collections with low resource usage. The framework incorporates recent advances with large language models, and an efficient hybrid method for context retrieval. We conducted several sets of experiments with the Machine Reading for Question Answering (MRQA) 2019 Shared Task datasets, FAQuAD - a Brazilian Portuguese reading comprehension dataset, Wikipedia, and Retrieval-Augmented Generation Benchmark, to demonstrate SESAME's effectiveness. The results indicate that SESAME's domain adaptation using synthetic data significantly improves QA performance, generalizes across different domains and languages, and competes with or surpasses state-of-the-art systems in ODQA. Finally, SESAME is an open-source tool, and all code, datasets and experimental data are available for public use in our repository.

引用

页数：23

共 50 条

[1] QASAR: Self-Supervised Learning Framework for Extractive Question Answering
Assem, Haytham
Sarkar, Iajdeep
Dutta, Sourav
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1797 - 1808
[2] elBERto: Self-supervised commonsense learning for question answering
Zhan, Xunlin
Li, Yuan
Dong, Xiao
Liang, Xiaodan
Hu, Zhiting
Carin, Lawrence
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
[3] Self-supervised Dialogue Learning for Spoken Conversational Question Answering
Chen, Nuo
You, Chenyu
Zou, Yuexian
[J]. INTERSPEECH 2021, 2021, : 231 - 235
[4] Self-supervised Graph Contrastive Learning for Video Question Answering
Yao, Xuan
Gao, Jun-Yu
Xu, Chang-Sheng
[J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2083 - 2100
[5] A multi-scale self-supervised hypergraph contrastive learning framework for video question answering
Wang, Zheng
Wu, Bin
Ota, Kaoru
Dong, Mianxiong
Li, He
[J]. NEURAL NETWORKS, 2023, 168 : 272 - 286
[6] Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
Zhi, Xi
Mao, Zhendong
Liu, Chunxiao
Zhang, Peng
Wang, Bin
Zhang, Yongdong
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1083 - 1089
[7] A self-supervised language model selection strategy for biomedical question answering
Arabzadeh, Negar
Bagheri, Ebrahim
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 146
[8] Question answering beyond CLEF document collections
Costa, Luis
[J]. Evaluation of Multilingual and Multi-modal Information Retrieval, 2007, 4730 : 405 - 414
[9] SELF-SUPERVISED VISION-LANGUAGE PRETRAINING FOR MEDIAL VISUAL QUESTION ANSWERING
Li, Pengfei
Liu, Gang
Tan, Lin
Liao, Jinying
Zhong, Shenjun
[J]. 2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
[10] ASCL: Adaptive self-supervised counterfactual learning for robust visual question answering
Shu, Xinyao
Yan, Shiyang
Yang, Xu
Wu, Ziheng
Chen, Zhongfeng
Lu, Zhenyu
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248

← 1 2 3 4 5 →