LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction

被引:0
|
作者
Solawetz, Jacob [1 ]
Larson, Stefan [2 ]
机构
[1] Roboflow Inc, Minneapolis, MN 55414 USA
[2] Rosegold AI, Ann Arbor, MI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open Information Extraction (OIE) systems seek to compress the factual propositions of a sentence into a series of n-ary tuples. These tuples are useful for downstream tasks in natural language processing like knowledge base creation, textual entailment, and natural language understanding. However, current OIE datasets are limited in both size and diversity. We introduce a new dataset by converting the QA- SRL 2.0 dataset to a large-scale OIE dataset (LSOIE). Our LSOIE dataset is 20 times larger than the next largest human-annotated OIE dataset. We construct and evaluate several benchmark OIE models on LSOIE, providing baselines for future improvements on the task. Our LSOIE data, models, and code are made publicly available.(1)
引用
收藏
页码:2595 / 2600
页数:6
相关论文
共 50 条
  • [1] A large-scale Chinese patent dataset for information extraction
    Zheng, Qian
    Guo, Kefu
    Xu, Lin
    [J]. SYSTEMS SCIENCE & CONTROL ENGINEERING, 2024, 12 (01)
  • [2] BioRel: A Large-Scale Dataset for Biomedical Relation Extraction
    Xing, Rui
    Luo, Jie
    Song, Tengwei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1801 - 1808
  • [3] A WEAKLY SUPERVISED APPROACH FOR LARGE-SCALE RELATION EXTRACTION
    Jean-Louis, Ludovic
    Besancon, Romaric
    Ferret, Olivier
    Durand, Adrien
    [J]. KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 94 - 103
  • [4] Zenseact Open Dataset: A large-scale and diverse multimodal dataset for autonomous driving
    Alibeigi, Mina
    Ljungbergh, William
    Tonderski, Adam
    Hess, Georg
    Lilja, Adam
    Lindstrom, Carl
    Motorniuk, Daria
    Fu, Junsheng
    Widahl, Jenny
    Petersson, Christoffer
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20121 - 20131
  • [5] A Large-scale Dataset of (Open Source) License Text Variants
    Zacchiroli, Stefano
    [J]. 2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 757 - 761
  • [6] Information extraction system in large-scale web
    Hong, F
    Zhao, Z
    [J]. International Symposium on Communications and Information Technologies 2005, Vols 1 and 2, Proceedings, 2005, : 783 - 786
  • [7] SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
    Rosenthal, Sara
    Atanasova, Pepa
    Karadzhov, Georgi
    Zampieri, Marcos
    Nakov, Preslav
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 915 - 928
  • [8] DocRED: A Large-Scale Document-Level Relation Extraction Dataset
    Yao, Yuan
    Ye, Deming
    Li, Peng
    Han, Xu
    Lin, Yankai
    Liu, Zhenghao
    Liu, Zhiyuan
    Huang, Lixin
    Zhou, Jie
    Sun, Maosong
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 764 - 777
  • [9] RnR: Extraction of Visual Attributes from Large-Scale Fashion Dataset
    Lee, Sungjae
    Lee, Yeonji
    Kim, Junho
    Lee, Kyungyong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5043 - 5047
  • [10] Large-scale Opinion Relation Extraction with Distantly Supervised Neural Network
    Sun, Changzhi
    Wu, Yuanbin
    Lan, Man
    Sun, Shiliang
    Zhang, Qi
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 1033 - 1043