Harnessing large language models (LLMs) for candidate gene prioritization and selection

被引:9
|
作者
Toufiq, Mohammed [1 ]
Rinchai, Darawan [2 ]
Bettacchioli, Eleonore [3 ,4 ]
Kabeer, Basirudeen Syed Ahamed [5 ]
Khan, Taushif [1 ]
Subba, Bishesh [1 ]
White, Olivia [1 ]
Yurieva, Marina [1 ]
George, Joshy [1 ]
Jourde-Chiche, Noemie [6 ]
Chiche, Laurent [7 ]
Palucka, Karolina [1 ]
Chaussabel, Damien [1 ]
机构
[1] Jackson Lab Genom Med, Farmington, CT 06032 USA
[2] Rockefeller Univ, New York, NY USA
[3] Univ Bretagne Occidentale, INSERM UMR1227, Lymphocytes B & Autoimmunite, Brest, France
[4] CHU Brest, Serv Rhumatol, Brest, France
[5] Sidra Med, Doha, Qatar
[6] Hop La Conception, Serv Nephrol, Marseille, France
[7] Hop Europeen, Serv Med Interne, Marseille, France
关键词
Transcriptomics; Erythroid cells; Feature selection; Large language models; Generative artificial intelligence; CARBONIC-ANHYDRASE II; HUMAN RED-CELLS; ERYTHROID-CELLS; EXPRESSION; CANCER; FERROCHELATASE; TRANSPORT; SYNTHASE; BICARBONATE; INHIBITORS;
D O I
10.1186/s12967-023-04576-8
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundFeature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights. While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information. This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection.MethodsIn this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature. We evaluated four leading LLMs across multiple tasks. Next, we established a workflow leveraging LLMs. The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene's biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene.ResultsOf the four LLMs evaluated, OpenAI's GPT-4 and Anthropic's Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow. This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop. Module M9.2 served as an illustrative use case. The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH. Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information. GPT-4 initially chose BCL2L1, while Claude selected ALAS2. When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module.ConclusionsTaken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention. This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Harnessing large language models (LLMs) for candidate gene prioritization and selection
    Mohammed Toufiq
    Darawan Rinchai
    Eleonore Bettacchioli
    Basirudeen Syed Ahamed Kabeer
    Taushif Khan
    Bishesh Subba
    Olivia White
    Marina Yurieva
    Joshy George
    Noemie Jourde-Chiche
    Laurent Chiche
    Karolina Palucka
    Damien Chaussabel
    Journal of Translational Medicine, 21
  • [2] Harnessing the Power of Large Language Models (LLMs) for Electronic Health Records (EHRs) Optimization
    Nashwan, Abdulqadir J.
    Abujaber, Ahmad A.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (07)
  • [3] Towards Safer Large Language Models (LLMs)
    Lawrence, Carolin
    Bifulco, Roberto
    Gashteovski, Kiril
    Hung, Chia-Chien
    Ben Rim, Wiem
    Shaker, Ammar
    Oyamada, Masafumi
    Sadamasa, Kunihiko
    Enomoto, Masafumi
    Takeoka, Kunihiro
    NEC Technical Journal, 2024, 17 (02): : 64 - 74
  • [4] Lower Energy Large Language Models (LLMs)
    Lin, Hsiao-Ying
    Voas, Jeffrey
    COMPUTER, 2023, 56 (10) : 14 - 16
  • [5] LARGE LANGUAGE MODELS (LLMS) AND CHATGPT FOR BIOMEDICINE
    Arighi, Cecilia
    Brenner, Steven
    Lu, Zhiyong
    BIOCOMPUTING 2024, PSB 2024, 2024, : 641 - 644
  • [6] Recommender Systems in the Era of Large Language Models (LLMs)
    Zhao Z.
    Fan W.
    Li J.
    Liu Y.
    Mei X.
    Wang Y.
    Wen Z.
    Wang F.
    Zhao X.
    Tang J.
    Li Q.
    IEEE Transactions on Knowledge and Data Engineering, 2024, 36 (11) : 1 - 20
  • [7] Large language models (LLMs) as agents for augmented democracy
    Gudiño, Jairo F.
    Grandi, Umberto
    Hidalgo, César
    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2024, 382 (2285):
  • [8] Are Large Language Models (LLMs) Ready for Agricultural Applications?
    Shende, Ketan
    Resource: Engineering and Technology for Sustainable World, 2025, 32 (01): : 28 - 30
  • [9] Harnessing the Power of Large Language Models
    Hofmann, Meike
    Burch, Gerald F.
    Burch, Jana J.
    ISACA Journal, 2024, 1 : 32 - 39
  • [10] Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)
    Liang, Bor-Sung
    PROCEEDINGS OF THE 2024 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, ISPD 2024, 2024, : 233 - 234