PharmaBench: Enhancing ADMET benchmarks with large language models

被引:0
|
作者
Niu, Zhangming [1 ,2 ]
Xiao, Xianglu [1 ,3 ,4 ]
Wu, Wenfan [1 ,5 ,6 ]
Cai, Qiwei [1 ]
Jiang, Yinghui [1 ]
Jin, Wangzhen [1 ]
Wang, Minhao [1 ]
Yang, Guojian [1 ]
Kong, Lingkang [1 ]
Jin, Xurui [1 ]
Yang, Guang [2 ,3 ,4 ,7 ,8 ]
Chen, Hongming [5 ,6 ,9 ]
机构
[1] MindRank AI, Hangzhou, Zhejiang, Peoples R China
[2] Imperial Coll London, Natl Heart & Lung Inst, London SW7 2AZ, England
[3] Imperial Coll London, Bioengn Dept, London W12 7SL, England
[4] Imperial Coll London, Imperial X, London W12 7SL, England
[5] Huazhong Univ Sci & Technol, Coll Life Sci & Technol, Dept Bioinformat & Syst Biol, Wuhan, Hubei, Peoples R China
[6] Guangzhou Natl Lab, Guangzhou 510005, Peoples R China
[7] Royal Brompton Hosp, Cardiovasc Res Ctr, London SW3 6NP, England
[8] Kings Coll London, Sch Biomed Engn & Imaging Sci, London, England
[9] Guangzhou Med Univ, Sch Pharmaceut Sci, Guangzhou 511495, Peoples R China
基金
欧盟地平线“2020”;
关键词
PREDICTION;
D O I
10.1038/s41597-024-03793-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurately predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties early in drug development is essential for selecting compounds with optimal pharmacokinetics and minimal toxicity. Existing ADMET-related benchmark sets are limited in utility due to their small dataset sizes and the lack of representation of compounds used in drug discovery projects. These shortcomings hinder their application in model building for drug discovery. To address this issue, we propose a multi-agent data mining system based on Large Language Models that effectively identifies experimental conditions within 14,401 bioassays. This approach facilitates merging entries from different sources, culminating in the creation of PharmaBench. Additionally, we have developed a data processing workflow to integrate data from various sources, resulting in 156,618 raw entries. Through this workflow, we constructed PharmaBench, a comprehensive benchmark set for ADMET properties, which comprises eleven ADMET datasets and 52,482 entries. This benchmark set is designed to serve as an open-source dataset for the development of AI models relevant to drug discovery projects.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Enhancing Persona Consistency with Large Language Models
    Shi, Haozhe
    Niu, Kun
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 210 - 215
  • [2] Enhancing Conversational Search with Large Language Models
    Rocchietti, Guido
    Muntean, Cristina Ioana
    Nardini, Franco Maria
    [J]. ERCIM NEWS, 2024, (136): : 33 - 34
  • [3] Enhancing Biomedical Question Answering with Large Language Models
    Yang, Hua
    Li, Shilong
    Goncalves, Teresa
    [J]. INFORMATION, 2024, 15 (08)
  • [4] Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, and Evaluation Strategies
    Berti, Alessandro
    Kourani, Humam
    Haefke, Hannes
    Li, Chiao-Yun
    Schuster, Daniel
    [J]. ENTERPRISE, BUSINESS-PROCESS AND INFORMATION SYSTEMS MODELING, BPMDS 2024, EMMSAD 2024, 2024, 511 : 13 - 21
  • [5] Promptology: Enhancing Human–AI Interaction in Large Language Models
    Olla, Phillip
    Elliott, Lauren
    Abumeeiz, Mustafa
    Mihelich, Karen
    Olson, Joshua
    [J]. Information (Switzerland), 2024, 15 (10)
  • [6] Enhancing Genetic Improvement Mutations Using Large Language Models
    Brownlee, Alexander E.I.
    Callan, James
    Even-Mendoza, Karine
    Geiger, Alina
    Hanna, Carol
    Petke, Justyna
    Sarro, Federica
    Sobania, Dominik
    [J]. arXiv, 2023,
  • [7] Enhancing health assessments with large language models: A methodological approach
    Wang, Xi
    Zhou, Yujia
    Zhou, Guangyu
    [J]. APPLIED PSYCHOLOGY-HEALTH AND WELL BEING, 2024,
  • [8] MemoryBank: Enhancing Large Language Models with Long -Term Memory
    Zhong, Wanjun
    Guo, Lianghong
    Gao, Qiqi
    Ye, He
    Wang, Yanlin
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19724 - 19731
  • [9] Enhancing Urban Walkability Assessment with Multimodal Large Language Models
    Blecic, Ivan
    Saiu, Valeria
    Trunfio, Giuseppe A.
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT V, 2024, 14819 : 394 - 411
  • [10] Navigating Complexity: Enhancing Pediatric Diagnostics With Large Language Models
    Mitchell, James
    Bennett, Tellen D.
    [J]. PEDIATRIC CRITICAL CARE MEDICINE, 2024, 25 (06) : 577 - 580