Enabling deep learning for large scale question answering in Italian

被引:2
|
作者
Croce, Danilo [1 ]
Zelenanska, Alexandra [1 ]
Basili, Roberto [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Enterprise Engn, Rome, Italy
关键词
Question answering in Italian; deep learning; recurrent neural network with attention;
D O I
10.3233/IA-190018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent breakthroughs in the field of deep learning led to state-of-the-art results in several NLP tasks, such as Question Answering (QA). Unfortunately, the requirements of such neural QA systems are very strict due to the size of the involved training datasets. In cross-linguistic settings these requirements are not satisfied as training datasets for QA over non-English texts are often not available. This represents the major barrier for a wide-spread adoption of neural QA methods in NLP applications. In this paper, the acquisition of a large scale dataset for an open-domain factoid question answering system in Italian is discussed. It is obtained by automatic translation and linguistic elicitation of an existing English dataset, i.e. the SQUAD question-answer pair corpus. Even though the quality of the resulting corpus for Italian might not be completely satisfying, our work allowed to generate more than 60 thousand question-answer pairs. In the paper the impact of this resource on the QA process over the Italian Wikipedia is studied, according to different training conditions and architectural constraints. A comparative evaluation against the English version, in line with standards in the SQUAD literature, is carried out. The outcomes show that the results achievable for Italian are below the state-of-the-art for English, but the ability of learning not to respond (i.e. the adoption of techniques for detecting question whose answers are simply not available, i.e. EMPTY set of answers) allows the system to pursue reasonable levels of precision. This make it already usable within realistic application scenarios. Finally, an error analysis is presented that suggests possible future research directions on still critical but highly beneficial enhancements, in view of concrete QA applications in Italian.
引用
收藏
页码:49 / 61
页数:13
相关论文
共 50 条
  • [41] BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
    Balikas, Georgios
    Krithara, Anastasia
    Partalas, Ioannis
    Paliouras, George
    MULTIMODAL RETRIEVAL IN THE MEDICAL DOMAIN, MRMD 2015, 2015, 9059 : 26 - 39
  • [42] Development of a large-scale medical visual question-answering dataset
    Xiaoman Zhang
    Chaoyi Wu
    Ziheng Zhao
    Weixiong Lin
    Ya Zhang
    Yanfeng Wang
    Weidi Xie
    Communications Medicine, 4 (1):
  • [43] Deep learning-based question answering system for intelligent humanoid robot
    Budiharto, Widodo
    Andreas, Vincent
    Gunawan, Alexander Agung Santoso
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [44] Open Domain Question Answering with Character-level Deep Learning Models
    Lei, Kai
    Deng, Yang
    Zhang, Bing
    Shen, Ying
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2017, : 30 - 33
  • [45] Deep Learning Powered Question-Answering Framework for Organizations Digital Transformation
    Carvalho, Nuno Ramos
    Barbosa, Luis Soares
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON THEORY AND PRACTICE OF ELECTRONIC GOVERNANCE (ICEGOV2019), 2019, : 76 - 79
  • [46] Deep learning-based approach for Arabic open domain question answering
    Alsubhi, Kholoud
    Jamal, Amani
    Alhothali, Areej
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [47] A Review on Medical Textual Question Answering Systems Based on Deep Learning Approaches
    Mutabazi, Emmanuel
    Ni, Jianjun
    Tang, Guangyi
    Cao, Weidong
    APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [48] Large-Scale Relation Learning for Question Answering over Knowledge Bases with Pre-trained Language Models
    Yam, Yuanmeng
    Li, Rumei
    Wang, Sirui
    Zhang, Hongzhi
    Zan, Daoguang
    Zhang, Fuzheng
    Wu, Wei
    Xu, Weiran
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3653 - 3660
  • [49] Deep learning-based question answering system for intelligent humanoid robot
    Widodo Budiharto
    Vincent Andreas
    Alexander Agung Santoso Gunawan
    Journal of Big Data, 7
  • [50] Learning to Rank for Question Routing in Community Question Answering
    Ji, Zongcheng
    Wang, Bin
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 2363 - 2368