Molecular Representations in Machine-Learning-Based Prediction of PK Parameters for Insulin Analogs

被引:3
|
作者
Einarson, Kasper A. [1 ,2 ]
Bendtsen, Kristian M. [3 ]
Li, Kang [3 ]
Thomsen, Maria [3 ]
Kristensen, Niels R. [4 ]
Winther, Ole [1 ,5 ,6 ]
Fulle, Simone [3 ]
Clemmensen, Line [1 ]
Refsgaard, Hanne H. F. [2 ]
机构
[1] Danish Tech Univ DTU, Appl Math & Comp Sci, DK-2800 Kongens Lyngby, Denmark
[2] Novo Nordisk AS, Global Drug Discovery, Res & Early Dev R&ED, DK-2760 Malov, Denmark
[3] Novo Nordisk AS, Digital Sci & Innovat, R&ED, DK-2760 Malov, Denmark
[4] Novo Nord AS, Data Sci, Dev, DK-2860 Soborg, Denmark
[5] Copenhagen Univ Hosp, Ctr Genom Med, Rigshosp, DK-2100 Copenhagen, Denmark
[6] Univ Copenhagen, Bioinformat Ctr, Dept Biol, DK-2200 Copenhagen, Denmark
来源
ACS OMEGA | 2023年 / 8卷 / 26期
关键词
TECHNOLOGY; DISCOVERY;
D O I
10.1021/acsomega.3c01218
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Therapeutic peptidesand proteins derived from either endogenoushormones, such as insulin, or de novo design via display technologiesoccupy a distinct pharmaceutical space in between small moleculesand large proteins such as antibodies. Optimizing the pharmacokinetic(PK) profile of drug candidates is of high importance when it comesto prioritizing lead candidates, and machine-learning models can providea relevant tool to accelerate the drug design process. PredictingPK parameters of proteins remains difficult due to the complex factorsthat influence PK properties; furthermore, the data sets are smallcompared to the variety of compounds in the protein space. This studydescribes a novel combination of molecular descriptors for proteinssuch as insulin analogs, where many contained chemical modifications,e.g., attached small molecules for protraction of the half-life. Theunderlying data set consisted of 640 structural diverse insulin analogs,of which around half had attached small molecules. Other analogs wereconjugated to peptides, amino acid extensions, or fragment crystallizableregions. The PK parameters clearance (CL), half-life (T1/2), and meanresidence time (MRT) could be predicted by using classical machine-learningmodels such as Random Forest (RF) and Artificial Neural Networks (ANN)with root-mean-square errors of CL of 0.60 and 0.68 (log units) andaverage fold errors of 2.5 and 2.9 for RF and ANN, respectively. Bothrandom and temporal data splittings were employed to evaluate idealand prospective model performance with the best models, regardlessof data splitting, achieving a minimum of 70% of predictions withina twofold error. The tested molecular representations include (1)global physiochemical descriptors combined with descriptors encodingthe amino acid composition of the insulin analogs, (2) physiochemicaldescriptors of the attached small molecule, (3) protein language model(evolutionary scale modeling) embedding of the amino acid sequenceof the molecules, and (4) a natural language processing inspired embedding(mol2vec) of the attached small molecule. Encoding the attached smallmolecule via (2) or (4) significantly improved the predictions, whilethe benefit of using the protein language model-based encoding (3)depended on the used machine-learning model. The most important moleculardescriptors were identified as descriptors related to the molecularsize of both the protein and protraction part using Shapley additiveexplanations values. Overall, the results show that combining representationsof proteins and small molecules was key for PK predictions of insulinanalogs.
引用
收藏
页码:23566 / 23578
页数:13
相关论文
共 50 条
  • [1] Machine-learning-based prediction of regularization parameters for seismic inverse problems
    Liu, Shihuan
    Zhang, Jiashu
    [J]. ACTA GEOPHYSICA, 2021, 69 (03) : 809 - 820
  • [2] Machine-learning-based prediction of regularization parameters for seismic inverse problems
    Shihuan Liu
    Jiashu Zhang
    [J]. Acta Geophysica, 2021, 69 : 809 - 820
  • [3] Machine-learning-based prediction of parameters of secondaries in hadronic showers using calorimetric observables
    Chadeeva, M.
    Korpachev, S.
    [J]. JOURNAL OF INSTRUMENTATION, 2022, 17 (10)
  • [4] MLACP: machine-learning-based prediction of anticancer peptides
    Manavalan, Balachandran
    Basith, Shaherin
    Shin, Tae Hwan
    Choi, Sun
    Kim, Myeong Ok
    Lee, Gwang
    [J]. ONCOTARGET, 2017, 8 (44) : 77121 - 77136
  • [5] Machine-Learning-Based No Show Prediction in Outpatient Visits
    Elvira, C.
    Ochoa, A.
    Gonzalvez, J. C.
    Mochon, F.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 4 (07): : 29 - 34
  • [6] An exploration on the machine-learning-based stroke prediction model
    Zhi, Shenshen
    Hu, Xiefei
    Ding, Yan
    Chen, Huajian
    Li, Xun
    Tao, Yang
    Li, Wei
    [J]. FRONTIERS IN NEUROLOGY, 2024, 15
  • [7] Toward Machine-learning-based Metastudies: Applications to Cosmological Parameters
    Crossland, Tom
    Stenetorp, Pontus
    Kawata, Daisuke
    Riedel, Sebastian
    Kitching, Thomas D.
    Deshpande, Anurag
    Kimpson, Tom
    Liew-Cain, Choong Ling
    Pedersen, Christian
    Piras, Davide
    Sharma, Monu
    [J]. ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2023, 269 (02):
  • [8] Machine-learning-based pilot symbol assisted channel prediction
    Ye, Youjie
    Chen, Yunfei
    [J]. IET COMMUNICATIONS, 2022, 16 (08) : 866 - 877
  • [9] Machine-Learning-Based Diabetes Prediction Using Multisensor Data
    Site, Aditi
    Nurmi, Jari
    Lohan, Elena Simona
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (22) : 28370 - 28377
  • [10] Machine-Learning-Based Suitability Prediction for Mobile Applications for Kids
    Meng, Xianjun
    Li, Shaomei
    Malik, Muhammad Mohsin
    Umer, Qasim
    [J]. SUSTAINABILITY, 2022, 14 (19)