Building a deep learning-based QA system from a CQA dataset

被引:2
|
作者
Jin, Sol [1 ]
Lian, Xu [1 ]
Jung, Hanearl [1 ]
Park, Jinsoo [1 ]
Suh, Jihae [2 ]
机构
[1] Seoul Natl Univ, Coll Business Adm, Seoul, South Korea
[2] Seoul Natl Univ Sci & Technol, Coll Business Adm, Seoul, South Korea
关键词
Question answering (QA) system; Community question answering (CQA); BERT; T5; DECISION-SUPPORT; QUESTION; ANSWERS;
D O I
10.1016/j.dss.2023.114038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A man-made machine-reading comprehension (MRC) dataset is necessary to train the answer extraction part of existing Question Answering (QA) systems. However, a high-quality and well-structured dataset with question-paragraph-answer pairs is not usually found in the real world. Furthermore, updating or building an MRC dataset is a challenging and costly affair. To address these shortcomings, we propose a QA system that uses a large-scale English Community Question Answering (CQA) dataset (i.e., Stack Exchange) composed of 3,081,834 question-answer pairs. The QA system adopts a classifier-retriever-summarizer structure design. The question classifier and the answer retriever part are based on a Bidirectional Encoder Representations from Transformers (BERT) Natural Language Processing (NLP) model by Google, and the summarizer part introduces a deep learning-based Text-to-Text Transfer Transformer (T5) model to summarize the long answers. We instantiated the proposed QA system with 140 topics from the CQA dataset (including topics such as biology, law, politics, etc.) and conducted human and automatic evaluations. Our system presented encouraging results, considering that it provides high-quality answers to the questions in the test set and satisfied the requirements to develop a QA system without MRC datasets. Our results show the potential of building automatic and high-performance QA systems without being limited by man-made datasets, a significant step forward in the research of open-domain or specific-domain QA systems.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Automatic contouring QA method using a deep learning-based autocontouring system
    Rhee, Dong Joo
    Akinfenwa, Chidinma P. Anakwenze
    Rigaud, Bastien
    Jhingran, Anuja
    Cardenas, Carlos E.
    Zhang, Lifei
    Prajapati, Surendra
    Kry, Stephen F.
    Brock, Kristy K.
    Beadle, Beth M.
    Shaw, William
    O'Reilly, Frederika
    Parkes, Jeannette
    Burger, Hester
    Fakie, Nazia
    Trauernicht, Chris
    Simonds, Hannah
    Court, Laurence E.
    JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2022, 23 (08):
  • [2] DEEP LEARNING-BASED DOOR AND WINDOW DETECTION FROM BUILDING FACADE
    Sezen, G.
    Cakir, M.
    Atik, M. E.
    Duran, Z.
    XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION IV, 2022, 43-B4 : 315 - 320
  • [3] A benchmark dataset for deep learning-based airplane detection: HRPlanes
    Bakirman, Tolga
    Sertel, Elif
    INTERNATIONAL JOURNAL OF ENGINEERING AND GEOSCIENCES, 2023, 8 (03): : 212 - 223
  • [4] Deep Learning-based Mammogram Classification using Small Dataset
    Adedigba, Adeyinka P.
    Adeshina, Steve A.
    Aibinu, Abiodun M.
    2019 15TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTER AND COMPUTATION (ICECCO), 2019,
  • [5] Clinical Implementation of Deep Learning-Based VMAT Patient-Specific QA
    Yang, X.
    Wang, L.
    Li, D.
    Guo, Y.
    Li, Y.
    Guan, Y.
    Wu, X.
    Xu, S.
    Zhang, S.
    Chan, M.
    Yang, R.
    Geng, L.
    Sui, J.
    MEDICAL PHYSICS, 2021, 48 (06)
  • [6] African foods for deep learning-based food recognition systems dataset
    Ataguba, Grace
    Ezekiel, Rock
    Daniel, James
    Ogbuju, Emeka
    Orji, Rita
    DATA IN BRIEF, 2024, 53
  • [7] A Deep Learning-Based Regression Scheme for Angle Estimation in Image Dataset
    Rane, Tejal
    Bhatt, Abhishek
    RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION, RTIP2R 2022, 2023, 1704 : 282 - 296
  • [8] An Efficient Deep Learning-Based Skin Cancer Classifier for an Imbalanced Dataset
    Alam, Talha Mahboob
    Shaukat, Kamran
    Khan, Waseem Ahmad
    Hameed, Ibrahim A.
    Abd Almuqren, Latifah
    Raza, Muhammad Ahsan
    Aslam, Memoona
    Luo, Suhuai
    DIAGNOSTICS, 2022, 12 (09)
  • [9] Deep learning-based sewer defect classification for highly imbalanced dataset
    Dang, L. Minh
    Kyeong, SeonJae
    Li, Yanfen
    Wang, Hanxiang
    Nguyen, N. Tan
    Moon, Hyeonjoon
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 161
  • [10] Dataset Shrinking for Accelerated Deep Learning-Based Metamaterial Absorber Design
    Ding, Qimin
    Wan, Guobin
    Wang, Nan
    Ma, Xin
    IEEE MICROWAVE AND WIRELESS TECHNOLOGY LETTERS, 2023, 33 (08): : 1111 - 1114