Application of Bioactivity Profile-Based Fingerprints for Building Machine Learning Models

被引:23
|
作者
Sturm, Noe [1 ]
Sun, Jiangming [1 ]
Vandriessche, Yves [2 ]
Mayr, Andreas [3 ,4 ]
Klambauer, Guenter [3 ,4 ]
Carlsson, Lars [5 ]
Engkvist, Ola [1 ]
Chen, Hongming [1 ]
机构
[1] AstraZeneca, IMED Biotech Unit, Discovery Sci, Hit Discovery, Pepparedsleden 1, S-43153 Molndal, Sweden
[2] Intel Corp, Data Ctr Grp, Veldkant 31, B-2550 Kontich, Belgium
[3] Johannes Kepler Univ Linz, LIT AI Lab, Altenbergerstr 69, A-4040 Linz, Austria
[4] Johannes Kepler Univ Linz, Inst Machine Learning, Altenbergerstr 69, A-4040 Linz, Austria
[5] AstraZeneca, IMED Biotech Unit, Discovery Sci, Quantitat Biol, Pepparedsleden 1, S-43153 Molndal, Sweden
基金
欧盟地平线“2020”;
关键词
SIMILARITY; PREDICTION; SELECTION; DRUGS; RECOGNITION; INFORMATION; MOLECULES; MECHANISM; DISCOVERY; PATTERNS;
D O I
10.1021/acs.jcim.8b00550
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.
引用
收藏
页码:962 / 972
页数:11
相关论文
共 50 条
  • [1] Bacterial profile-based body fluid identification using a machine learning approach
    Kim, Sungmin
    Lee, Han Chul
    Sim, Jeong Eun
    Park, Su Jeong
    Oh, Hye Hyun
    GENES & GENOMICS, 2025, 47 (01) : 87 - 98
  • [2] A Machine Learning-based Framework for Building Application Failure Prediction Models
    Pellegrini, Alessandro
    Di Sanzo, Pierangelo
    Avresky, Dimiter R.
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1072 - 1081
  • [3] Profile-based push models in manpower planning
    Guerry, Marie-Anne
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2008, 24 (01) : 13 - 20
  • [4] Profile-based Adaptive DiffServ Policing with Learning Techniques
    Cruvinel, Laercio
    Vazao, Teresa
    2011 20TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN), 2011,
  • [5] Task distribution models in grids: towards a profile-based approach
    Mury, Antonio R.
    Schulze, Bruno
    Gomes, Antonio T. A.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (03): : 358 - 374
  • [6] A profile-based approach to parametric sensitivity in multiresponse regression models
    Sulieman, H
    McLellan, PJ
    Bacon, DW
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 45 (04) : 721 - 740
  • [7] Discover learning path for group users: A profile-based approach
    Xie, Haoran
    Zou, Di
    Wang, Fu Lee
    Wong, Tak-Lam
    Rao, Yanghui
    Wang, Simon Ho
    NEUROCOMPUTING, 2017, 254 : 59 - 70
  • [8] Application of Intelligent Building Design Combining CAD Technology and Machine Learning Models
    Wu N.
    Ye Z.
    Computer-Aided Design and Applications, 2024, 21 (S27): : 29 - 43
  • [9] Application of Machine Learning Models for Fast and Accurate Predictions of Building Energy Need
    Barbaresi, Alberto
    Ceccarelli, Mattia
    Menichetti, Giulia
    Torreggiani, Daniele
    Tassinari, Patrizia
    Bovo, Marco
    ENERGIES, 2022, 15 (04)
  • [10] Application of machine learning techniques for creating urban microbial fingerprints
    Ryan, Feargal Joseph
    BIOLOGY DIRECT, 2019, 14 (01)