Building a deep learning-based QA system from a CQA dataset

被引:2
|
作者
Jin, Sol [1 ]
Lian, Xu [1 ]
Jung, Hanearl [1 ]
Park, Jinsoo [1 ]
Suh, Jihae [2 ]
机构
[1] Seoul Natl Univ, Coll Business Adm, Seoul, South Korea
[2] Seoul Natl Univ Sci & Technol, Coll Business Adm, Seoul, South Korea
关键词
Question answering (QA) system; Community question answering (CQA); BERT; T5; DECISION-SUPPORT; QUESTION; ANSWERS;
D O I
10.1016/j.dss.2023.114038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A man-made machine-reading comprehension (MRC) dataset is necessary to train the answer extraction part of existing Question Answering (QA) systems. However, a high-quality and well-structured dataset with question-paragraph-answer pairs is not usually found in the real world. Furthermore, updating or building an MRC dataset is a challenging and costly affair. To address these shortcomings, we propose a QA system that uses a large-scale English Community Question Answering (CQA) dataset (i.e., Stack Exchange) composed of 3,081,834 question-answer pairs. The QA system adopts a classifier-retriever-summarizer structure design. The question classifier and the answer retriever part are based on a Bidirectional Encoder Representations from Transformers (BERT) Natural Language Processing (NLP) model by Google, and the summarizer part introduces a deep learning-based Text-to-Text Transfer Transformer (T5) model to summarize the long answers. We instantiated the proposed QA system with 140 topics from the CQA dataset (including topics such as biology, law, politics, etc.) and conducted human and automatic evaluations. Our system presented encouraging results, considering that it provides high-quality answers to the questions in the test set and satisfied the requirements to develop a QA system without MRC datasets. Our results show the potential of building automatic and high-performance QA systems without being limited by man-made datasets, a significant step forward in the research of open-domain or specific-domain QA systems.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-based Baseline System
    Gogate, Mandar
    Dashtipour, Kia
    Hussain, Amir
    INTERSPEECH 2020, 2020, : 4521 - 4525
  • [32] QA of deep learning-based synthetic CTs for adaptive proton therapy using uncertainty estimation
    Galapon, A. J.
    Thummerer, A.
    Wagenaar, D.
    Steiu, M.
    Langendijk, J.
    Both, S.
    RADIOTHERAPY AND ONCOLOGY, 2023, 182 : S1698 - S1699
  • [33] Application of deep learning-based speech system in online music learning system
    Wei, Bo
    Ma, Shanshan
    SOFT COMPUTING, 2023,
  • [34] Deep Learning-Based Driving Maneuver Prediction System
    Ou, Chaojie
    Karray, Fakhri
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (02) : 1328 - 1340
  • [35] Deep Learning-based fault prediction in cloud system
    Dinh Dai Vu
    Xuan Tuong Vu
    Kim, Younghan
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1826 - 1829
  • [36] A Deep Learning-based System for DDoS Attack Anticipation
    Silva, Gabriel Lucas F. M. e
    de Neira, Anderson Bergamini
    Nogueira, Michele
    2022 IEEE LATIN-AMERICAN CONFERENCE ON COMMUNICATIONS (LATINCOM), 2022,
  • [37] A deep learning-based medication behavior monitoring system
    Roh, Hyeji
    Shin, Seulgi
    Han, Jinseo
    Lim, Sangsoon
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (02) : 1513 - 1528
  • [38] Deep Learning-Based Vein Localization on Embedded System
    Tang, Chaoying
    Xia, Shuhang
    Qian, Mengen
    Wang, Biao
    IEEE ACCESS, 2021, 9 : 27916 - 27927
  • [39] A Deep Learning-Based Chemical System for QSAR Prediction
    Hu, ShanShan
    Chen, Peng
    Gu, Pengying
    Wang, Bing
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) : 3020 - 3028
  • [40] IoT and Deep Learning-Based Farmer Safety System
    Adhitya, Yudhi
    Mulyani, Grathya Sri
    Koppen, Mario
    Leu, Jenq-Shiou
    SENSORS, 2023, 23 (06)