Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark

被引:5
|
作者
Harnie, Dries [1 ,3 ]
Vapirev, Alexander E. [2 ,3 ]
Wegner, Jorg Kurt [2 ]
Gedich, Andrey [6 ]
Steijaert, Marvin [7 ]
Wuyts, Roel [3 ,4 ,5 ]
De Meuter, Wolfgang [1 ]
机构
[1] Vrije Univ Brussel, Software Languages Lab, Pl Laan 2, B-1050 Brussels, Belgium
[2] Janssen Pharmaceut, B-2340 Beerse, Belgium
[3] ExaSci Life Lab, B-3001 Leuven, Belgium
[4] IMEC, B-3001 Leuven, Belgium
[5] Katholieke Univ Leuven, DistriNet, B-3001 Leuven, Belgium
[6] ARCADIA Inc, Rostra Business Ctr, St Petersburg 195112, Russia
[7] OpenAnalytics, B-2220 Heist Op Den Berg, Belgium
关键词
IDENTIFICATION; TOOL;
D O I
10.1109/CCGrid.2015.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of machine learning predictor programs, written in single-node C++. These programs take a long time to run and are inherently parallel, but do not use multiple nodes. We show how we reimplemented the pipeline using Apache Spark, which enabled us to lift the existing programs to a multi-node cluster without making changes to the predictors. We have benchmarked our Spark pipeline against the original, which shows almost linear speedup up to 8 nodes. In addition, our pipeline generates fewer intermediate files while allowing easier checkpointing and monitoring.
引用
收藏
页码:871 / 879
页数:9
相关论文
共 50 条
  • [1] Scaling machine learning for target prediction in drug discovery using Apache Spark
    Harnie, Dries
    Saey, Mathijs
    Vapirev, Alexander E.
    Wegner, Jorg Kurt
    Gedich, Andrey
    Steijaert, Marvin
    Ceulemans, Hugo
    Wuyts, Roel
    De Meuter, Wolfgang
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 67 : 409 - 417
  • [2] Prediction of Drug Target Sensitivity in Cancer Cell Lines Using Apache Spark
    Hussain, Shahid
    Ferzund, Javed
    Raza-Ul-Haq
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2019, 26 (08) : 882 - 889
  • [3] MLlib: Machine learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D.B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. Journal of Machine Learning Research, 2016, 17
  • [4] MLlib: Machine Learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D. B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [5] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [6] Applications of Machine Learning in Drug Target Discovery
    Gao, Dongrui
    Chen, Qingyuan
    Zeng, Yuanqi
    Jiang, Meng
    Zhang, Yongqing
    [J]. CURRENT DRUG METABOLISM, 2020, 21 (10) : 790 - 803
  • [7] Machine learning for target discovery in drug development
    Rodrigues, Tiago
    Bernardes, Goncalo J. L.
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2020, 56 : 16 - 22
  • [8] Prediction of Cardiovascular Risk Using Extreme Learning Machine-Tree Classifier on Apache Spark Cluster
    Jaya Lakshmi, A.
    Venkatramaphanikumar, S.
    Kolli, Venkata K. K.
    [J]. Recent Advances in Computer Science and Communications, 2022, 15 (03) : 443 - 455
  • [9] Limits of Prediction for Machine Learning in Drug Discovery
    von Korff, Modest
    Sander, Thomas
    [J]. FRONTIERS IN PHARMACOLOGY, 2022, 13
  • [10] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    [J]. PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49