Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引:0
|
作者
Liangxiu Han
Hwee Yong Ong
机构
[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology
[2] University of Edinburgh,School of Informatics
来源
Cluster Computing | 2015年 / 18卷
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;
D O I
暂无
中图分类号
学科分类号
摘要
Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
引用
收藏
页码:403 / 418
页数:15
相关论文
共 50 条
  • [31] PARALLEL KNOWLEDGE ACQUISITION ALGORITHM FOR BIG DATA USING MAPREDUCE
    Qian, Jin
    Xia, Min
    Lv, Ping
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 316 - 321
  • [32] Parallel knowledge acquisition algorithms for big data using MapReduce
    Jin Qian
    Min Xia
    Xiaodong Yue
    International Journal of Machine Learning and Cybernetics, 2018, 9 : 1007 - 1021
  • [33] Efficient Results Merging for Parallel Data Clustering Using MapReduce
    Bousbaci, Abdelhak
    Kamel, Nadjet
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, (DCAI 2016), 2016, 474 : 349 - 357
  • [34] Parallel knowledge acquisition algorithms for big data using MapReduce
    Qian, Jin
    Xia, Min
    Yue, Xiaodong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (06) : 1007 - 1021
  • [35] Parallel data intensive computing in scientific and commercial applications
    Cannataro, M
    Talia, D
    Srimani, PK
    PARALLEL COMPUTING, 2002, 28 (05) : 673 - 704
  • [36] Bucket MapReduce: Relieving the Disk I/O Intensity of Data-Intensive Applications in MapReduce Frameworks
    Chen, Kai-Hsun
    Chen, Hsin-Yuan
    Wang, Chien-Min
    2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 18 - 25
  • [37] Design of Self-Adjusting algorithm for data-intensive MapReduce Applications
    Nagiwale, Amin Nazir
    Umale, Manish R.
    Sinha, Aditya Kumar
    2015 INTERNATIONAL CONFERENCE ON ENERGY SYSTEMS AND APPLICATIONS, 2015, : 506 - 510
  • [38] Application of data mining for biomedical data processing
    Shon H.-S.
    Kim K.-O.
    Cha E.-J.
    Kim K.-A.
    Transactions of the Korean Institute of Electrical Engineers, 2016, 65 (07): : 1236 - 1241
  • [39] Mining Biomedical Ontologies and Data Using RDF Hypergraphs
    Liu, Haishan
    Dou, Dejing
    Jin, Ruoming
    LePendu, Paea
    Shah, Nigam
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 141 - 146
  • [40] SmartPortal™ for biomedical data mining
    Buczak, Anna L.
    Wan, Charles
    Petry, Glenn
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 221 - 227