Harnessing large language models (LLMs) for candidate gene prioritization and selection

被引：9

作者：

Toufiq, Mohammed ^{[1
]}

Rinchai, Darawan ^{[2
]}

Bettacchioli, Eleonore ^{[3
,4
]}

Kabeer, Basirudeen Syed Ahamed ^{[5
]}

Khan, Taushif ^{[1
]}

Subba, Bishesh ^{[1
]}

White, Olivia ^{[1
]}

Yurieva, Marina ^{[1
]}

George, Joshy ^{[1
]}

Jourde-Chiche, Noemie ^{[6
]}

Chiche, Laurent ^{[7
]}

Palucka, Karolina ^{[1
]}

Chaussabel, Damien ^{[1
]}

机构：

[1] Jackson Lab Genom Med, Farmington, CT 06032 USA

[2] Rockefeller Univ, New York, NY USA

[3] Univ Bretagne Occidentale, INSERM UMR1227, Lymphocytes B & Autoimmunite, Brest, France

[4] CHU Brest, Serv Rhumatol, Brest, France

[5] Sidra Med, Doha, Qatar

[6] Hop La Conception, Serv Nephrol, Marseille, France

[7] Hop Europeen, Serv Med Interne, Marseille, France

来源：

JOURNAL OF TRANSLATIONAL MEDICINE | 2023年 / 21卷 / 01期

关键词：

Transcriptomics; Erythroid cells; Feature selection; Large language models; Generative artificial intelligence; CARBONIC-ANHYDRASE II; HUMAN RED-CELLS; ERYTHROID-CELLS; EXPRESSION; CANCER; FERROCHELATASE; TRANSPORT; SYNTHASE; BICARBONATE; INHIBITORS;

D O I：

10.1186/s12967-023-04576-8

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

BackgroundFeature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights. While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information. This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection.MethodsIn this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature. We evaluated four leading LLMs across multiple tasks. Next, we established a workflow leveraging LLMs. The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene's biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene.ResultsOf the four LLMs evaluated, OpenAI's GPT-4 and Anthropic's Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow. This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop. Module M9.2 served as an illustrative use case. The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH. Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information. GPT-4 initially chose BCL2L1, while Claude selected ALAS2. When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module.ConclusionsTaken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention. This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge.

引用

页数：33

共 50 条

[41] HARNESSING TASK OVERLOAD FOR SCALABLE JAILBREAK ATTACKS ON LARGE LANGUAGE MODELS
Dong, Yiting
Shen, Guobin
Zhao, Dongcheng
He, Xiang
Zeng, Yi
arXiv,
[42] Harnessing the potential of large language models in medical education: promise and pitfalls
Benitez, Trista M.
Xu, Yueyuan
Boudreau, J. Donald
Kow, Alfred Wei Chieh
Bello, Fernando
Phuoc, Le Van
Wang, Xiaofei
Sun, Xiaodong
Leung, Gilberto Ka-Kit
Lan, Yanyan
Wang, Yaxing
Cheng, Davy
Tham, Yih-Chung
Wong, Tien Yin
Chung, Kevin C.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (03) : 776 - 783
[43] Harnessing the Power of Large Language Models for Automated Code Generation and Verification
Antero, Unai
Blanco, Francisco
Onativia, Jon
Salle, Damien
Sierra, Basilio
ROBOTICS, 2024, 13 (09)
[44] Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models
Zhao, Wei
Li, Zhe
Li, Yige
Sun, Jun
arXiv,
[45] A survey on augmenting knowledge graphs (KGs) with large language models (LLMs): models, evaluation metrics, benchmarks, and challenges
Ibrahim, Nourhan
Aboulela, Samar
Ibrahim, Ahmed
Kashef, Rasha
Discover Artificial Intelligence, 2024, 4 (01):
[46] GPT, large language models (LLMs) and generative artificial intelligence (GAI) models in geospatial science: a systematic review
Wang, Siqin
Hu, Tao
Xiao, Huang
Li, Yun
Zhang, Ce
Ning, Huan
Zhu, Rui
Li, Zhenlong
Ye, Xinyue
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
[47] Generating Specifications from Requirements Documents for Smart Devices Using Large Language Models (LLMs)
Lutze, Rainer
Waldhoer, Klemens
HUMAN-COMPUTER INTERACTION, PT I, HCI 2024, 2024, 14684 : 94 - 108
[48] The Role of Large Language Models (LLMs) in Providing Triage for Maxillofacial Trauma Cases: A Preliminary Study
Frosolini, Andrea
Catarzi, Lisa
Benedetti, Simone
Latini, Linda
Chisci, Glauco
Franz, Leonardo
Gennaro, Paolo
Gabriele, Guido
DIAGNOSTICS, 2024, 14 (08)
[49] Beyond Probabilities: Unveiling the Delicate Dance of Large Language Models (LLMs) and AI-Hallucination
Hamid, Oussama H.
2024 IEEE CONFERENCE ON COGNITIVE AND COMPUTATIONAL ASPECTS OF SITUATION MANAGEMENT, COGSIMA, 2024, : 85 - 90
[50] PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)
Nazzal, Mahmoud
Khalil, Issa
Khreishah, Abdallah
Phan, NhatHai
CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security, : 2266 - 2279

← 1 2 3 4 5 →