Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引:0
|
作者
Liangxiu Han
Hwee Yong Ong
机构
[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology
[2] University of Edinburgh,School of Informatics
来源
Cluster Computing | 2015年 / 18卷
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;
D O I
暂无
中图分类号
学科分类号
摘要
Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
引用
收藏
页码:403 / 418
页数:15
相关论文
共 50 条
  • [21] Parallel Mining Frequent Patterns over Big Transactional Data in Extended MapReduce
    Chen, Hui
    Lin, Tsau Young
    Zhang, Zhibing
    Zhong, Jie
    2013 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2013, : 43 - 48
  • [22] Performance of Scalable Off-The-Shelf Hardware for Data-intensive Parallel Processing using MapReduce
    Fadzil, Ahmad Firdaus Ahmad
    Khalid, Noor Elaiza Abdul
    Manaf, Mazani
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 379 - 384
  • [23] Characterization of Power Usage and Performance in Data-Intensive Applications Using MapReduce over MPI
    Davis, Joshua
    Gao, Tao
    Chandrasekaran, Sunita
    Jagode, Heike
    Danalis, Anthony
    Dongarra, Jack
    Balaji, Pavan
    Taufer, Michela
    PARALLEL COMPUTING: TECHNOLOGY TRENDS, 2020, 36 : 287 - 298
  • [24] A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
    Apiletti, Daniele
    Baralis, Elena
    Cerquitelli, Tania
    Garza, Paolo
    Pulvirenti, Fabio
    Michiardi, Pietro
    BIG DATA RESEARCH, 2017, 10 : 53 - 69
  • [25] Fuzzy MapReduce Data Mining algorithms
    Reddy, Poli Venkata Subba
    2018 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2018, : 304 - 309
  • [26] Scientific data mining and processing using MapReduce in cloud environments
    Kong, Xiangsheng
    Kong, Xiangsheng, 1600, Journal of Chemical and Pharmaceutical Research, 3/668 Malviya Nagar, Jaipur, Rajasthan, India (06): : 1270 - 1276
  • [27] On using MapReduce to scale algorithms for Big Data analytics: a case study
    Kijsanayothin, Phongphun
    Chalumporn, Gantaphon
    Hewett, Rattikorn
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [28] Novel Weather Data Analysis Using Hadoop and MapReduce - A Case Study
    Suryanarayana, V.
    Sathish, B. S.
    Ranganayakulu, A.
    Ganesan, P.
    2019 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS (ICACCS), 2019, : 204 - 207
  • [29] On using MapReduce to scale algorithms for Big Data analytics: a case study
    Phongphun Kijsanayothin
    Gantaphon Chalumporn
    Rattikorn Hewett
    Journal of Big Data, 6
  • [30] Parallel Data Processing with MapReduce: A Survey
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    Choi, Hyunsik
    Chung, Yon Dohn
    Moon, Bongki
    SIGMOD RECORD, 2011, 40 (04) : 11 - 20