PolicyQA: A Reading Comprehension Dataset for Privacy Policies

被引:0
|
作者
Ahmad, Wasi Uddin [1 ]
Chi, Jianfeng [2 ]
Tian, Yuan [2 ]
Chang, Kai-Wei [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Univ Virginia, Charlottesville, VA 22903 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Privacy policy documents are long and verbose. A question answering (QA) system can assist users in finding the information that is relevant and important to them. Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question. On the contrary, we argue that providing users with a short text span from policy documents reduces the burden of searching the target information from a lengthy text segment. In this paper, we present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies. PolicyQA provides 714 human-annotated questions written for a wide range of privacy practices. We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
引用
收藏
页码:743 / 749
页数:7
相关论文
共 50 条
  • [1] BIOMRC: A Dataset for Biomedical Machine Reading Comprehension
    Stavropoulos, Petros
    Pappas, Dimitris
    Androutsopoulos, Ion
    McDonald, Ryan
    [J]. 19TH SIGBIOMED WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2020), 2020, : 140 - 149
  • [2] BioRead: A New Dataset for Biomedical Reading Comprehension
    Pappas, Dimitris
    Androutsopoulos, Ion
    Papageorgiou, Haris
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2771 - 2776
  • [3] IIRC: A Dataset of Incomplete Information Reading Comprehension Questions
    Ferguson, James
    Gardner, Matt
    Hajishirzi, Hannaneh
    Khot, Tushar
    Dasigi, Pradeep
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1137 - 1147
  • [4] ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
    Sun, Haitian
    Cohen, William W.
    Salakhutdinov, Ruslan
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3627 - 3637
  • [5] Dataset for the First Evaluation on Chinese Machine Reading Comprehension
    Cui, Yiming
    Liu, Ting
    Chen, Zhipeng
    Ma, Wentao
    Wang, Shijin
    Hu, Guoping
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2721 - 2725
  • [6] TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions
    Ning, Qiang
    Wu, Hao
    Han, Rujun
    Peng, Nanyun
    Gardner, Matt
    Roth, Dan
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1158 - 1172
  • [7] Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
    Yue, Xiang
    Gutierrez, Bernal Jimenez
    Sun, Huan
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4474 - 4486
  • [8] ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension
    Saikh, Tanik
    Ekbal, Asif
    Bhattacharyya, Pushpak
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5498 - 5504
  • [9] LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
    Liu, Jian
    Cui, Leyang
    Liu, Hanmeng
    Huang, Dandan
    Wang, Yile
    Zhang, Yue
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3622 - 3628
  • [10] A Span-Extraction Dataset for Chinese Machine Reading Comprehension
    Cui, Yiming
    Liu, Ting
    Che, Wanxiang
    Xiao, Li
    Chen, Zhipeng
    Ma, Wentao
    Wang, Shijin
    Hu, Guoping
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5883 - 5889