Improving Biomedical Question Answering by Data Augmentation and Model Weighting

被引:2
|
作者
Du, Yongping [1 ]
Yan, Jingya [1 ]
Lu, Yuxuan [1 ]
Zhao, Yiliang [1 ]
Jin, Xingnan [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
基金
北京市自然科学基金; 国家重点研发计划;
关键词
Biological system modeling; Data models; Training; Task analysis; Predictive models; Context modeling; Training data; Biomedical question answering; data augmentation; deep learning; model weighting;
D O I
10.1109/TCBB.2022.3171388
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Biomedical Question Answering aims to extract an answer to the given question from a biomedical context. Due to the strong professionalism of specific domain, it's more difficult to build large-scale datasets for specific domain question answering. Existing methods are limited by the lack of training data, and the performance is not as good as in open-domain settings, especially degrading when facing to the adversarial sample. We try to resolve the above issues. First, effective data augmentation strategies are adopted to improve the model training, including slide window, summarization and round-trip translation. Second, we propose a model weighting strategy for the final answer prediction in biomedical domain, which combines the advantage of two models, open-domain model QANet and BioBERT pre-trained in biomedical domain data. Finally, we give adversarial training to reinforce the robustness of the model. The public biomedical dataset collected from PubMed provided by BioASQ challenge is used to evaluate our approach. The results show that the model performance has been improved significantly compared to the single model and other models participated in BioASQ challenge. It can learn richer semantic expression from data augmentation and adversarial samples, which is beneficial to solve more complex question answering problems in biomedical domain.
引用
收藏
页码:1114 / 1124
页数:11
相关论文
共 50 条
  • [1] Data Augmentation for Biomedical Factoid Question Answering
    Pappas, Dimitris
    Malakasiotis, Prodromos
    Androutsopoulos, Ion
    [J]. PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 63 - 81
  • [2] Contextual embedding and model weighting by fusing domain knowledge on Biomedical Question Answering
    Lu, Yuxuan
    Yan, Jingya
    Qi, Zhixuan
    Ge, Zhongzheng
    Du, Yongping
    [J]. 13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [3] Data Augmentation Method for Question Answering
    Ding, Jiajie
    Xiao, Kang
    Ye, Heng
    Zhou, Xiabing
    Zhang, Min
    [J]. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58 (01): : 54 - 60
  • [4] Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning
    Zheng, Yuhang
    Wang, Zhen
    Chen, Long
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1084 - 1088
  • [5] Weighting of Passages in Question Answering
    Novotny, Vit
    Sojka, Petr
    [J]. RASLAN 2018: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2018, : 31 - 40
  • [6] Data-Centric and Model-Centric Approaches for Biomedical Question Answering
    Yoon, Wonjin
    Yoo, Jaehyo
    Seo, Sumin
    Sung, Mujeen
    Jeong, Minbyul
    Kim, Gangwoo
    Kang, Jaewoo
    [J]. EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2022), 2022, 13390 : 204 - 216
  • [7] Rethinking Data Augmentation for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Xiao, Jun
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112
  • [8] Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
    Bartolo, Max
    Thrush, Tristan
    Jia, Robin
    Riedel, Sebastian
    Stenetorp, Pontus
    Kiela, Douwe
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8830 - 8848
  • [9] Model tree learning for query term weighting in question answering
    Monz, Christof
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 589 - 596
  • [10] External features enriched model for biomedical question answering
    Xu, Gezheng
    Rong, Wenge
    Wang, Yanmeng
    Ouyang, Yuanxin
    Xiong, Zhang
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)