PotatoG-DKB: a potato gene-disease knowledge base mined from biological literature

被引:0
|
作者
Xie, Congjiao [1 ,2 ]
Gao, Jing [1 ,2 ,3 ]
Chen, Junjie [1 ,2 ]
Zhao, Xuyang [1 ]
机构
[1] Inner Mongolia Agr Univ, Coll Comp & Informat Engn, Hohhot, Inner Mongolia, Peoples R China
[2] Inner Mongolia Autonomous Reg Key Lab Big Data Res, Hohhot, Inner Mongolia, Peoples R China
[3] Inner Mongolia Autonomous Reg Govt Serv & Data Man, Hohhot, Inner Mongolia, Peoples R China
来源
PEERJ | 2024年 / 12卷
关键词
Knowledge base; Literature mining; Potato; Large language model; Disease;
D O I
10.7717/peerj.18202
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Potato is the fourth largest food crop in the world, but potato cultivation faces serious threats from various diseases and pests. Despite significant advancements in research on potato disease resistance, these findings are scattered across numerous publications. For researchers, obtaining relevant knowledge by reading and organizing a large body of literature is a time-consuming and labor-intensive process. Therefore, systematically extracting and organizing the relationships between potato genes and diseases from the literature to establish a potato gene-disease knowledge base is particularly important. Unfortunately, there is currently no such gene-disease knowledge base available. Methods: In this study, we constructed a Potato Gene-Disease Knowledge Base (PotatoG-DKB) using natural language processing techniques and large language models. We used PubMed as the data source and obtained 2,906 article abstracts related to potato biology, extracted entities and relationships between potato genes and related disease, and stored them in a Neo4j database. Using web technology, we also constructed the Potato Gene-Disease Knowledge Portal (PotatoG-DKP), an interactive visualization platform. Results: PotatoG-DKB encompasses 22 entity types (such as genes, diseases, species, etc.) of 5,206 nodes and 9,443 edges between entities (for example, gene-disease, pathogen-disease, etc.). PotatoG-DKP can intuitively display associative relationships extracted from literature and is a powerful assistant for potato biologists and breeders to understand potato pathogenesis and disease resistance. More details about PotatoG-DKP can be obtained at https://www.potatogd.com.cn/.
引用
收藏
页数:19
相关论文
共 10 条
  • [1] Identifying gene-disease associations using centrality on a literature mined gene-interaction network
    Oezguer, Arzucan
    Vu, Thuy
    Erkan, Guenes
    Radev, Dragomir R.
    BIOINFORMATICS, 2008, 24 (13) : I277 - I285
  • [2] The mining and construction of a knowledge base for gene-disease association in mitochondrial diseases
    Wang, Wei
    Song, Junying
    Chuai, Yunhai
    Chen, Fu
    Song, Chunlan
    Shu, Mingming
    Wang, Yayun
    Li, Yunfei
    Zhai, Xinyu
    Han, Shujie
    Yao, Shun
    Shen, Kexin
    Shang, Wei
    Zhang, Lei
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [3] The mining and construction of a knowledge base for gene-disease association in mitochondrial diseases
    Wei Wang
    Junying Song
    Yunhai Chuai
    Fu Chen
    Chunlan Song
    Mingming Shu
    Yayun Wang
    Yunfei Li
    Xinyu Zhai
    Shujie Han
    Shun Yao
    Kexin Shen
    Wei Shang
    Lei Zhang
    Scientific Reports, 11
  • [4] Automatic extraction of gene-disease associations from literature using joint ensemble learning
    Bhasuran, Balu
    Natarajan, Jeyakumar
    PLOS ONE, 2018, 13 (07):
  • [5] Distant Supervision for Large-Scale Extraction of Gene-Disease Associations from Literature Using DeepDive
    Bhasuran, Balu
    Natarajan, Jeyakumar
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 367 - 374
  • [6] Mining gene-disease relationships from biomedical literature: Weighting protein-protein interactions and connectivity measures
    Gonzalez, Graciela
    Uribe, Juan C.
    Tari, Luis
    Brophy, Colleen
    Baral, Chitta
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, 2007, : 28 - +
  • [7] A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources
    Rebholz-Schuhmann, Dietrich
    Grabmueller, Christoph
    Kavaliauskas, Silvestras
    Croset, Samuel
    Woollard, Peter
    Backofen, Rolf
    Filsells, Wendy
    Clark, Dominic
    DRUG DISCOVERY TODAY, 2014, 19 (07) : 882 - 889
  • [8] Immune modulators in disease: integrating knowledge from the biomedical literature and gene expression
    Geifman, Nophar
    Bhattacharya, Sanchita
    Butte, Atul J.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (03) : 617 - 626
  • [9] Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature
    Xu, Rong
    Li, Li
    Wang, QuanQiu
    BIOINFORMATICS, 2013, 29 (17) : 2186 - 2194
  • [10] From text mining to knowledge: PubChem knowledge panels provide synopsis of chemical, gene, protein and disease term co-occurrences in biomedical literature
    Zaslaysky, Leonid
    Gindulyte, Asta
    Thiessen, Paul
    Bolton, Evan
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256