Efficient iterative virtual screening with Apache Spark and conformal prediction

被引:30
|
作者
Ahmed, Laeeq [1 ]
Georgiev, Valentin [2 ]
Capuccini, Marco [2 ,3 ]
Toor, Salman [3 ]
Schaal, Wesley [2 ]
Laure, Erwin [1 ]
Spjuth, Ola [2 ]
机构
[1] Royal Inst Technol KTH, Dept Computat Sci & Technol, Lindstedtsvagen 5, S-10044 Stockholm, Sweden
[2] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden
[3] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
来源
关键词
Virtual screening; Docking; Conformal prediction; Cloud computing; Apache Spark; DRUG DISCOVERY; LARGE-SCALE; BENCHMARKING; DOCKING; QSAR;
D O I
10.1186/s13321-018-0265-z
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Efficient iterative virtual screening with Apache Spark and conformal prediction
    Laeeq Ahmed
    Valentin Georgiev
    Marco Capuccini
    Salman Toor
    Wesley Schaal
    Erwin Laure
    Ola Spjuth
    Journal of Cheminformatics, 10
  • [2] Efficient Performance Prediction for Apache Spark
    Cheng, Guoli
    Ying, Shi
    Wang, Bingming
    Li, Yuhang
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 149 : 40 - 51
  • [3] Ensemble Learning for Large Scale Virtual Screening on Apache Spark
    Sid, Karima
    Batouche, Mohamed
    COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS, 2018, 522 : 244 - 256
  • [4] Large-scale virtual screening on public cloud resources with Apache Spark
    Capuccini, Marco
    Ahmed, Laeeq
    Schaal, Wesley
    Laure, Erwin
    Spjuth, Ola
    JOURNAL OF CHEMINFORMATICS, 2017, 9
  • [5] Large-scale virtual screening on public cloud resources with Apache Spark
    Marco Capuccini
    Laeeq Ahmed
    Wesley Schaal
    Erwin Laure
    Ola Spjuth
    Journal of Cheminformatics, 9
  • [6] Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction
    Svensson, Fredrik
    Norinder, Ulf
    Bender, Andreas
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (03) : 439 - 444
  • [7] Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening
    Sid, Karima
    Batouche, Mohamed
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2021, 13 (1-2) : 160 - 191
  • [8] Performance Prediction for Apache Spark Platform
    Wang, Kewen
    Khan, Mohammad Maifi Hasan
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 166 - 173
  • [9] Execution Time Prediction for Apache Spark
    Gao, Zhipeng
    Wang, Ting
    Wang, Qian
    Yang, Yang
    2018 INTERNATIONAL CONFERENCE ON COMPUTING AND BIG DATA (ICCBD 2018), 2018, : 47 - 51
  • [10] Elastic Executor Provisioning for Iterative Workloads on Apache Spark
    Yang, Donglin
    Rang, Wei
    Cheng, Dazhao
    Wang, Yu
    Tian, Jiannan
    Tao, Dingwen
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 413 - 422