Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引:0
|
作者
Liangxiu Han
Hwee Yong Ong
机构
[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology
[2] University of Edinburgh,School of Informatics
来源
Cluster Computing | 2015年 / 18卷
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;
D O I
暂无
中图分类号
学科分类号
摘要
Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
引用
收藏
页码:403 / 418
页数:15
相关论文
共 50 条
  • [41] Computation Model of Data Intensive Computing with MapReduce
    Adamov, Abzetdin Z.
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [42] Data-Intensive Text Processing with MapReduce
    Xu, Peng
    COMPUTATIONAL LINGUISTICS, 2011, 37 (03) : 635 - 637
  • [43] Large-Scale Multimedia Data Mining Using MapReduce Framework
    Wang, Hanli
    Shen, Yun
    Wang, Lei
    Zhufeng, Kuangtian
    Wang, Wei
    Cheng, Cheng
    2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,
  • [44] Meta-MapReduce for scalable data mining
    Liu X.
    Wang X.
    Matwin S.
    Japkowicz N.
    J. Big Data, 1 (1):
  • [45] Hybrid Data Mining Algorithm in Cloud Computing using MapReduce Framework
    Sahay, Siddharth
    Khetarpal, Suruchi
    Pradhan, Tribikram
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 507 - 511
  • [46] Data Cube Materialization and Mining over MapReduce
    Nandi, Arnab
    Yu, Cong
    Bohannon, Philip
    Ramakrishnan, Raghu
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (10) : 1747 - 1759
  • [47] High performance data mining using data cubes on parallel computers
    Goil, S
    Choudhary, A
    FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 548 - 555
  • [48] Parallel Prime Number Labeling of Large XML Data Using MapReduce
    Ahn, Jinhyun
    Im, Dong-Hyuk
    Lee, Taewhi
    Kim, Hong-Gee
    2016 6TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS 2016), 2016, : 176 - 177
  • [49] Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce
    Huang, Jian
    Qiao, Shaoqing
    Yu, Haitao
    Qie, Jinhui
    Liu, Chunwei
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1498 - 1503
  • [50] Parallel Attribute Reduction Algorithm for Complex Heterogeneous Data Using MapReduce
    Zhang, Tengfei
    Ma, Fumin
    Cao, Jie
    Peng, Chen
    Yue, Dong
    COMPLEXITY, 2018,