Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering

被引:0
|
作者
Carrino, Casimiro Pio [1 ]
Costa-jussa, Marta R. [1 ]
Fonollosa, Jose A. R. [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
关键词
Question Answering; Multilinguality; Corpus Creation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recently, multilingual question answering became a crucial research topic, and it is receiving increased interest in the NLP community. However, the unavailability of large-scale datasets makes it challenging to train multilingual QA systems with performance comparable to the English ones. In this work, we develop the Translate Align Retrieve (TAR) method to automatically translate the Stanford Question Answering Dataset (SQuAD) v1.1 to Spanish. We then used this dataset to train Spanish QA systems by fine-tuning a Multilingual-BERT model. Finally, we evaluated our QA models with the recently proposed MLQA and XQuAD benchmarks for cross-lingual Extractive QA. Experimental results show that our models outperform the previous Multilingual-BERT baselines achieving the new state-of-the-art values of 68.1 F1 on the Spanish MLQA corpus and 77.6 F1 on the Spanish XQuAD corpus. The resulting, synthetically generated SQuAD-es v1.1 corpora, with almost 100% of data contained in the original English version, to the best of our knowledge, is the first large-scale QA training resource for Spanish.
引用
收藏
页码:5515 / 5523
页数:9
相关论文
共 50 条
  • [41] A Portuguese Dataset for Evaluation of Semantic Question Answering
    de Araujo, Denis Andrei
    Rigo, Sandro Jose
    Quaresma, Paulo
    Muniz, Joao Henrique
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 217 - 227
  • [42] AQA: a multilingual Anaphora annotation scheme for Question Answering
    Boldrini, E.
    Puchol-Blasco, M.
    Navarro, B.
    Martinez-Barco, P.
    Vargas-Sierra, C.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 97 - 104
  • [43] Architecture and evaluation of BRUJA, a multilingual question answering system
    M. Á. García-Cumbreras
    F. Martínez-Santiago
    L. A. Ureña-López
    [J]. Information Retrieval, 2012, 15 : 413 - 432
  • [44] Demoing Platypus - A Multilingual Question Answering Platform for Wikidata
    Tanon, Thomas Pellissier
    de Assuncao, Marcos Dias
    Caron, Eddy
    Suchanek, Fabian M.
    [J]. SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 111 - 116
  • [45] Towards End-to-End Multilingual Question Answering
    Loginova, Ekaterina
    Varanasi, Stalin
    Neumann, Guenter
    [J]. INFORMATION SYSTEMS FRONTIERS, 2021, 23 (01) : 227 - 241
  • [46] Multilingual question answering with high portability on relational databases
    Jung, HM
    Lee, GG
    Choi, WS
    Min, K
    Seo, J
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (02) : 306 - 315
  • [47] Overview of the CLEF 2005 Multilingual Question Answering Track
    Vallin, Alessandro
    Magnini, Bernardo
    Giampiccolo, Danilo
    Aunimo, Lili
    Ayache, Christelle
    Osenova, Petya
    Penas, Anselmo
    de Rijke, Maarten
    Sacaleanu, Bogdan
    Santos, Diana
    Sutcliffe, Richard
    [J]. ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 307 - 331
  • [48] Towards End-to-End Multilingual Question Answering
    Ekaterina Loginova
    Stalin Varanasi
    Günter Neumann
    [J]. Information Systems Frontiers, 2021, 23 : 227 - 241
  • [49] Overview of the Clef 2008 Multilingual Question Answering Track
    Forner, Pamela
    Penas, Anselmo
    Agirre, Eneko
    Alegria, Inaki
    Forascu, Corina
    Moreau, Nicolas
    Osenova, Petya
    Prokopidis, Prokopis
    Rocha, Paulo
    Sacaleanu, Bogdan
    Sutcliffe, Richard
    Sang, Erik Tjong Kim
    [J]. EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 262 - +
  • [50] A Lexical Approach for Spanish Question Answering
    Tellez, Alberto
    Juarez, Antonio
    Hernandez, Gustavo
    Denicia, Claudia
    Villatoro, Esau
    Montes, Manuel
    Villasenor, Luis
    [J]. ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 328 - 331