Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout

被引:7
|
作者
Rallapalli, Sreekanth [1 ]
Gondkar, R. R. [2 ]
Rao, Golajapu Venu Madhava [3 ]
机构
[1] Bharathiyar Univ, R&D Ctr, Coimbatore, Tamil Nadu, India
[2] AIT, Bangalore, Karnataka, India
[3] Botho Univ, Gaborone, Botswana
关键词
Big data; Clustering; Hadoop; K-means; Mahout; NoSQL;
D O I
10.1007/978-81-322-2755-7_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Increase in data volume and need for analytics has led towards innovation of big data. To speed up the query responses models like NoSQL has emerged. Virtualized platforms using commodity hardware and implementing Hadoop on it helps small and midsized companies use cloud environment. This will help organizations to decrease the cost for data processing and analytics. As health care generating volumes and variety of data it is required to build parallel algorithms that can support petabytes of data using hadoop and MapReduce parallel processing. K-means clustering is one of the methods for parallel algorithm. In order to build an accurate system large data sets need to be considered. Memory requirement increases with large data sets and algorithms become slow. Mahout scalable algorithms developed works better with huge data sets and improve the performance of the system. Mahout is an open source and can be used to solve problems arising with huge data sets. This paper proposes cloud based K-means clustering running as a MapReduce job. We use health care data on cloud for clustering. We then compare the results with various measures to conclude the best measure to find number of vectors in a given cluster.
引用
收藏
页码:127 / 135
页数:9
相关论文
共 50 条
  • [1] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [2] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    [J]. The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [3] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    [J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [4] Canopy with k-means Clustering Algorithm for Big Data Analytics
    Sagheer, Noor S.
    Yousif, Suhad A.
    [J]. FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
  • [5] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    [J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [6] Analysis of big data job requirements based on K-means text clustering in China
    Debao, Dai
    Yinxia, Ma
    Min, Zhao
    [J]. PLOS ONE, 2021, 16 (08):
  • [7] A Novel MapReduce Based k-Means Clustering
    Sinha, Ankita
    Jana, Prasanta K.
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION, 2017, 458 : 247 - 255
  • [8] Parallel K-Means Clustering Based on MapReduce
    Zhao, Weizhong
    Ma, Huifang
    He, Qing
    [J]. CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
  • [9] Big Data Analytics Model for Distributed Document Using Hybrid Optimization with K-Means Clustering
    Sharma, Kapil
    Saini, Satish
    Sharma, Shailja
    Kang, Hardeep Singh
    Bouye, Mohamed
    Krah, Daniel
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [10] Improving Clustering Efficiency by SimHash-based K-Means Algorithm for Big Data Analytics
    Wang, Jenq-Haur
    Lin, Jia-Zhi
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1881 - 1888