BioRel: A Large-Scale Dataset for Biomedical Relation Extraction

被引:0
|
作者
Xing, Rui [1 ]
Luo, Jie [1 ]
Song, Tengwei [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Sch Comp Sci & Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
distant supervision; relation extraction; dataset; Medline;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Valuable biomedical knowledge usually exists in the form of electronic publications and literature, which is growing at an enormous rate. Relation extraction plays a critical role in discovering such knowledge and transform them into structural form. Previous relation extraction datasets in biomedical domain are mainly human-annotated, whose scales are usually limited due to their labor-intensive and time-consuming nature. In this paper, we present BioRel, a large-scale dataset constructed by using Unified Medical Language System (UMLS) as knowledge base and Medline as corpus. Entities in sentences of Medline are identified and linked to UMLS by Metamap. Relation label for each sentence is recognized using distant supervision. We adapt both state-of-the-art deep learning and statistical machine learning methods as baseline models and conduct comprehensive experiments on BioRel. Experimental results show that BioRel is suitable for training and evaluating relation extraction models for both deep learning and statistical methods by providing both reasonable baseline performance and many remaining challenges.
引用
收藏
页码:1801 / 1808
页数:8
相关论文
共 50 条
  • [1] BioRel: towards large-scale biomedical relation extraction
    Rui Xing
    Jie Luo
    Tengwei Song
    [J]. BMC Bioinformatics, 21
  • [2] BioRel: towards large-scale biomedical relation extraction
    Xing, Rui
    Luo, Jie
    Song, Tengwei
    [J]. BMC BIOINFORMATICS, 2020, 21 (Suppl 16)
  • [3] TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction
    Stefano Marchesin
    Gianmaria Silvello
    [J]. BMC Bioinformatics, 23
  • [4] TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction
    Marchesin, Stefano
    Silvello, Gianmaria
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [5] DocRED: A Large-Scale Document-Level Relation Extraction Dataset
    Yao, Yuan
    Ye, Deming
    Li, Peng
    Han, Xu
    Lin, Yankai
    Liu, Zhenghao
    Liu, Zhiyuan
    Huang, Lixin
    Zhou, Jie
    Sun, Maosong
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 764 - 777
  • [6] HacRED: A Large-Scale Relation Extraction Dataset Toward Hard Cases in Practical Applications
    Cheng, Qiao
    Liu, Juntao
    Qu, Xiaoye
    Zhao, Jin
    Liang, Jiaqing
    Wang, Zhefeng
    Huai, Baoxing
    Yuan, Nicholas Jing
    Xiao, Yanghua
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2819 - 2831
  • [7] A large-scale Chinese patent dataset for information extraction
    Zheng, Qian
    Guo, Kefu
    Xu, Lin
    [J]. SYSTEMS SCIENCE & CONTROL ENGINEERING, 2024, 12 (01)
  • [8] A large-scale dataset for korean document-level relation extraction from encyclopedia texts
    Son, Suhyune
    Lim, Jungwoo
    Koo, Seonmin
    Kim, Jinsung
    Kim, Younghoon
    Lim, Youngsik
    Hyun, Dongseok
    Lim, Heuiseok
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8681 - 8701
  • [9] LSOIE: A Large-Scale Dataset for Supervised Open Information Extraction
    Solawetz, Jacob
    Larson, Stefan
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2595 - 2600
  • [10] BioRED: a rich biomedical relation extraction dataset
    Luo, Ling
    Lai, Po-Ting
    Wei, Chih-Hsuan
    Arighi, Cecilia N.
    Lu, Zhiyong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)