Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark

被引：5

作者：

Harnie, Dries ^{[1
,3
]}

Vapirev, Alexander E. ^{[2
,3
]}

Wegner, Jorg Kurt ^{[2
]}

Gedich, Andrey ^{[6
]}

Steijaert, Marvin ^{[7
]}

Wuyts, Roel ^{[3
,4
,5
]}

De Meuter, Wolfgang ^{[1
]}

机构：

[1] Vrije Univ Brussel, Software Languages Lab, Pl Laan 2, B-1050 Brussels, Belgium

[2] Janssen Pharmaceut, B-2340 Beerse, Belgium

[3] ExaSci Life Lab, B-3001 Leuven, Belgium

[4] IMEC, B-3001 Leuven, Belgium

[5] Katholieke Univ Leuven, DistriNet, B-3001 Leuven, Belgium

[6] ARCADIA Inc, Rostra Business Ctr, St Petersburg 195112, Russia

[7] OpenAnalytics, B-2220 Heist Op Den Berg, Belgium

来源：

2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING | 2015年

关键词：

IDENTIFICATION; TOOL;

D O I：

10.1109/CCGrid.2015.50

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of machine learning predictor programs, written in single-node C++. These programs take a long time to run and are inherently parallel, but do not use multiple nodes. We show how we reimplemented the pipeline using Apache Spark, which enabled us to lift the existing programs to a multi-node cluster without making changes to the predictors. We have benchmarked our Spark pipeline against the original, which shows almost linear speedup up to 8 nodes. In addition, our pipeline generates fewer intermediate files while allowing easier checkpointing and monitoring.

引用

页码：871 / 879

页数：9

共 50 条

[41] Performance evaluation of DNN with other machine learning techniques in a cluster using Apache Spark and MLlib
JayaLakshmi, A. N. M.
Kishore, K. V. Krishna
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (01) : 1311 - 1319
[42] Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
Chaudhury, Mousumi
Karami, Amin
Ghazanfar, Mustansar Ali
ELECTRONICS, 2022, 11 (16)
[43] Performance Evaluation of Machine Learning Algorithms in Apache Spark for Intrusion Detection
Dobson, Anthony
Roy, Kaushik
Yuan, Xiaohong
Xu, Jinsheng
2018 28TH INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2018, : 374 - 379
[44] Model averaging in distributed machine learning: a case study with Apache Spark
Guo, Yunyan
Zhang, Zhipeng
Jiang, Jiawei
Wu, Wentao
Zhang, Ce
Cui, Bin
Li, Jianzhong
VLDB JOURNAL, 2021, 30 (04): : 693 - 712
[45] Machine Learning in Drug Discovery
Hochreiter, Sepp
Klambauer, Guenter
Rarey, Matthias
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (09) : 1723 - 1724
[46] Machine learning in drug discovery
Nature Biotechnology, 2023, 41 (7) : 907 - 907
[47] Machine Learning in Drug Discovery
Klambauer, Guenter
Hochreiter, Sepp
Rarey, Matthias
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 945 - 946
[48] Cloud-agnostic architectures for machine learning based on Apache Spark
Nagy, Eniko
Lovas, Robert
Pintye, Istvan
Hajnal, Akos
Kacsuk, Peter
ADVANCES IN ENGINEERING SOFTWARE, 2021, 159
[49] Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark
Dunner, Celestine
Parnell, Thomas
Atasu, Kubilay
Sifalakis, Manolis
Pozidis, Haralampos
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 331 - 338
[50] Machine learning approach on apache spark for credit card fraud detection
Santosh T.
Ramesh D.
Ingenierie des Systemes d'Information, 2020, 25 (01): : 101 - 106

← 1 2 3 4 5 →