Large-scale virtual screening on public cloud resources with Apache Spark

被引:14
|
作者
Capuccini, Marco [1 ,2 ]
Ahmed, Laeeq [3 ]
Schaal, Wesley [2 ]
Laure, Erwin [3 ]
Spjuth, Ola [2 ]
机构
[1] Uppsala Univ, Dept Informat Technol, Box 337, S-75105 Uppsala, Sweden
[2] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden
[3] Royal Inst Technol KTH, Dept Computat Sci & Technol, Lindstedtsvagen 5, S-10044 Stockholm, Sweden
来源
关键词
Virtual screening; Docking; Cloud computing; Apache Spark; MAPREDUCE;
D O I
10.1186/s13321-017-0204-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Remote Attestation of Large-scale Virtual Machines in the Cloud Data Center
    Chene, Jie
    Zhang, Kun
    Tu, Bibo
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 180 - 187
  • [32] An Apache Spark Implementation of Block Power Method for Computing Dominant Eigenvalues and Eigenvectors of Large-Scale Matrices
    Ji, Hao
    Weinberg, Seth H.
    Li, Min
    Wang, Jianxin
    Li, Yaohang
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 554 - 559
  • [33] Leveraging cloud computing for large-scale QM calculations: Application to virtual screening and structure-based design
    Rai, Brajesh
    Sresht, Vishnu
    Yang, Qingyi
    Unwalla, Ray
    Tu, Meihua
    Mathiowetz, Alan
    Bakken, Gregory
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [34] Large-Scale Learning with AdaGrad on Spark
    Hadgu, Asmelash Teka
    Nigam, Aastha
    Diaz-Aviles, Ernesto
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2828 - 2830
  • [35] How Different are the Cloud Workloads? Characterizing Large-Scale Private and Public Cloud Workloads
    Qin, Xiaoting
    Ma, Minghua
    Zhao, Yuheng
    Zhang, Jue
    Du, Chao
    Liu, Yudong
    Parayil, Anjaly
    Bansal, Chetan
    Rajmohan, Saravan
    Goiri, Inigo
    Cortez, Eli
    Qin, Si
    Lin, Qingwei
    Zhang, Dongmei
    2023 53RD ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, DSN, 2023, : 522 - 530
  • [36] Large-Scale Docking in the Cloud
    Tingle, Benjamin I.
    Irwin, John J.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (09) : 2735 - 2741
  • [37] An optimized emergency resources allocation algorithm for large-scale public emergency
    Wang, Su-Sheng
    Wang, Yan
    Sun, Jian
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 119 - +
  • [38] Discovery of Immunoproteasome Inhibitors Using Large-Scale Covalent Virtual Screening
    Scarpino, Andrea
    Bajusz, David
    Proj, Matic
    Gobec, Martina
    Sosic, Izidor
    Gobec, Stanislav
    Ferenczy, Gyoergy G.
    Keseru, Gyoergy M.
    MOLECULES, 2019, 24 (14):
  • [39] Resources scheduling strategy of very large-scale terrain based on cloud computing
    Zeng, Y. (zyyhost@126.com), 1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (06):
  • [40] Large-scale screening on small scale
    Figeys, D
    TRENDS IN BIOTECHNOLOGY, 2000, 18 (09) : 363 - 364