Large language model based framework for automated extraction of genetic interactions from unstructured data

被引:1
|
作者
Gill, Jaskaran Kaur [1 ]
Chetty, Madhu [1 ]
Lim, Suryani [1 ]
Hallinan, Jennifer [1 ,2 ]
机构
[1] Federat Univ, Hlth Innovat & Transformat Ctr, Ballarat, Vic, Australia
[2] BioThink, Brisbane, Qld, Australia
来源
PLOS ONE | 2024年 / 19卷 / 05期
关键词
NEURAL-NETWORK; ENTITY; INTEGRATION;
D O I
10.1371/journal.pone.0303231
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Extraction of Failure Graphs from Structured and Unstructured data
    Schierle, Martin
    Trabold, Daniel
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 324 - 330
  • [32] StructGPT: A General Framework for Large Language Model to Reason over Structured Data
    Jiang, Jinhao
    Zhou, Kun
    Dong, Zican
    Ye, Keming
    Zhao, Wayne Xin
    Wen, Ji-Rong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9237 - 9251
  • [33] Semi-Automated Information Extraction from Unstructured Threat Advisories
    Ramnani, Roshni R.
    Shivaram, Karthik
    Sengupta, Shubhashis
    Annervaz, K. M.
    PROCEEDINGS OF THE 10TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE, 2017, : 181 - 187
  • [34] Development of an Automated Construction Contract Review Framework Using Large Language Model and Domain Knowledge
    Kim, Eu Wang
    Shin, Yeon Ju
    Kim, Kyong Ju
    Kwon, Sehoon
    BUILDINGS, 2025, 15 (06)
  • [35] A Framework for Agricultural Intelligent Analysis Based on a Visual Language Large Model
    Yu, Piaofang
    Lin, Bo
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [36] GPU Accelerated MapReduce-Based Distributed Framework for Knowledge Extraction from Large Uncertain Data
    Tapan Chowdhury
    Chiradip Bhattacharya
    Sagarika Chowdhury
    Mrinal Kanti Nath
    Manashi De
    SN Computer Science, 5 (8)
  • [37] Automated extraction and parameterization of motions in large data sets
    Kovar, L
    Gleicher, M
    ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03): : 559 - 568
  • [38] Event extraction based on self-data augmentation with large language models
    Yang, Lishan
    Fan, Xi
    Wang, Xiangyu
    Wang, Xin
    Chen, Qiuju
    MEMETIC COMPUTING, 2025, 17 (01)
  • [39] Arabic ontology extraction model from unstructured text
    Saber, Yasser Mohamed
    Abdel-Galil, Hala
    Belal, Mohamed Abd El -Fatah
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 6066 - 6076
  • [40] Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts
    Hoherchak, H.
    Darchuk, N.
    Kryvyi, S.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2021, 57 (03) : 481 - 500