Retrieval Enhanced Data Augmentation for Question Answering on Privacy Policies

被引:0
|
作者
Parvez, Md Rizwan [1 ]
Chi, Jianfeng [2 ]
Ahmad, Wasi Uddin [1 ]
Tian, Yuan [1 ]
Chang, Kai-Wei [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Univ Virginia, Charlottesville, VA USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prior studies in privacy policies frame the question answering (QA) task as identifying the most relevant text segment or a list of sentences from a policy document given a user query. Existing labeled datasets are heavily imbalanced (only a few relevant segments), limiting the QA performance in this domain. In this paper, we develop a data augmentation framework based on ensembling retriever models that captures the relevant text segments from unlabeled policy documents and expand the positive examples in the training set. In addition, to improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models. Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%. Our ablation studies provide further insights into the effectiveness of our approach.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [1] Retrieval Data Augmentation Informed by Downstream Question Answering Performance
    Ferguson, James
    Dasigi, Pradeep
    Khot, Tushar
    Hajishirzi, Hannaneh
    [J]. PROCEEDINGS OF THE FIFTH FACT EXTRACTION AND VERIFICATION WORKSHOP (FEVER 2022), 2022, : 1 - 5
  • [2] Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation
    Yang, Yinfei
    Jin, Ning
    Lin, Kuo
    Guo, Mandy
    Cer, Daniel
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 263 - 268
  • [3] Data Augmentation Method for Question Answering
    Ding, Jiajie
    Xiao, Kang
    Ye, Heng
    Zhou, Xiabing
    Zhang, Min
    [J]. Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58 (01): : 54 - 60
  • [4] Data Augmentation for Biomedical Factoid Question Answering
    Pappas, Dimitris
    Malakasiotis, Prodromos
    Androutsopoulos, Ion
    [J]. PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 63 - 81
  • [5] Question Answering for Privacy Policies: Combining Computational and Legal Perspectives
    Ravichander, Abhilasha
    Black, Alan
    Wilson, Shomir
    Norton, Thomas
    Sadeh, Norman
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4947 - 4958
  • [6] RECIPE: Applying Open Domain Question Answering to Privacy Policies
    Shvartzshanider, Yan
    Balashankar, Ananth
    Wies, Thomas
    Subramanian, Lakshminarayanan
    [J]. MACHINE READING FOR QUESTION ANSWERING, 2018, : 71 - 77
  • [7] Knowledge-Enhanced Retrieval: A Scheme for Question Answering
    Lin, Fake
    Cao, Weican
    Zhang, Wen
    Chen, Liyi
    Hong, Yuan
    Xu, Tong
    Tan, Chang
    [J]. CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 102 - 113
  • [8] Harnessing the Power of Metadata for Enhanced Question Retrieval in Community Question Answering
    Ghasemi, Shima
    Shakery, Azadeh
    [J]. IEEE ACCESS, 2024, 12 : 65768 - 65779
  • [9] Rethinking Data Augmentation for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Xiao, Jun
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112
  • [10] Question Answering Models for Privacy Policies of Mobile Apps: Are We There Yet?
    Alkhattabi, Khalid
    Bird, Davita
    Miller, Kai
    Yue, Chuan
    [J]. SCIENCE OF CYBER SECURITY, SCISEC 2022, 2022, 13580 : 333 - 352