Parallelizing the execution of native data mining algorithms for computational biology

被引:31
|
作者
Coro, Gianpaolo [1 ]
Candela, Leonardo [1 ]
Pagano, Pasquale [1 ]
Italiano, Angela [1 ]
Liccardo, Loredana [1 ]
机构
[1] Ist Sci & Tecnol Informaz A Faedo CNR, I-56124 Pisa, Italy
来源
关键词
data mining; parallel processing; cloud computing; computational biology; distributed systems; R; ENVELOPE MODELS; CLIMATE-CHANGE; LIFE-HISTORY; ENVIRONMENT; TRAITS;
D O I
10.1002/cpe.3435
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data mining is being increasingly used in biology. Biologists are adopting prototyping languages, like R and Matlab, to facilitate the application of data mining algorithms to their data. As a result, their scripts are becoming increasingly complex and also require frequent updates. Application to large datasets becomes impractical and the time-to-paper increases. Furthermore, even if there are various systems that can be used to efficiently process large datasets, for example, using Cloud and High Performance Computing, they usually require procedures to be translated into specific languages or to be adapted to a certain computing platform. Such modifications can speed up the processing, but translation is not automatic, especially in complex cases, and can require a large amount of programming effort and accurate validation. In this paper, we propose an approach to parallelize data mining procedures in the form of compiled software or R scripts developed by biology communities of practice. Our approach requires minimal alteration of the original code. In many cases, there is no need for code modification. Furthermore, it allows for fast updating when a new version is ready. We clarify the constraints and the benefits of our method and report a practical use case to demonstrate such benefits compared with a standard execution. Our approach relies on a distributed network of web services and ultimately exposes the algorithms as-a-Service, to be invoked by remote thin clients. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:4630 / 4644
页数:15
相关论文
共 50 条
  • [1] Data Mining Algorithms Parallelizing in Functional Programming Language for Execution in Cluster
    Kholod, Ivan
    Malov, Aleksey
    Rodionov, Sergey
    [J]. INTERNET OF THINGS, SMART SPACES, AND NEXT GENERATION NETWORKS AND SYSTEMS, 2015, 9247 : 140 - 151
  • [2] Scalable Data Mining Algorithms in Computational Biology and Biomedicine
    Zou, Quan
    Mrozek, Dariusz
    Ma, Qin
    Xu, Yungang
    [J]. BIOMED RESEARCH INTERNATIONAL, 2017, 2017
  • [3] Runtime support for parallelizing data mining algorithms
    Jin, RM
    Agrawal, G
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 212 - 223
  • [4] A Functional Approach to Parallelizing Data Mining Algorithms in Java']Java
    Kholod, Ivan
    Shorov, Andrey
    Gorlatch, Sergei
    [J]. PARALLEL COMPUTING TECHNOLOGIES (PACT 2017), 2017, 10421 : 459 - 472
  • [5] Framework for Multi Threads Execution of Data Mining Algorithms
    Kholod, Ivan
    [J]. PROCEEDINGS OF THE 2015 IEEE NORTH WEST RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (2015 ELCONRUSNW), 2015, : 82 - 88
  • [6] Application of Actor Model for Distributed Execution of Data Mining Algorithms
    Kapustin, Nikita
    Kholod, Ivan
    Petuhov, Ilya
    [J]. 2015 XVIII INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM), 2015, : 208 - 210
  • [7] Toward complete genome data mining in computational biology
    Ukkonen, E
    [J]. ALGORITHM THEORY - SWAT 2000, 2000, 1851 : 20 - 21
  • [8] Parallelizing evolutionary algorithms for clustering data
    Kwedlo, Wojciech
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2006, 3911 : 430 - 438
  • [9] Data Mining Algorithms Parallelization in Logic Programming Framework for Execution in Cluster
    Malov, Aleksey
    Rodionov, Sergey
    Shorov, Andrey
    [J]. INTERNET OF THINGS, SMART SPACES, AND NEXT GENERATION NETWORKS AND SYSTEMS, NEW2AN 2019, RUSMART 2019, 2019, 11660 : 91 - 103
  • [10] Creation of Data Mining Algorithms as Functional Expression for Parallel and Distributed Execution
    Kholod, Ivan
    Petukhov, Ilya
    [J]. PARALLEL COMPUTING TECHNOLOGIES (PACT 2015), 2015, 9251 : 62 - 67