Cloud Based K-Means Clustering Running as a MapReduce Job for Big Data Healthcare Analytics Using Apache Mahout

被引：7

作者：

Rallapalli, Sreekanth ^{[1
]}

Gondkar, R. R. ^{[2
]}

Rao, Golajapu Venu Madhava ^{[3
]}

机构：

[1] Bharathiyar Univ, R&D Ctr, Coimbatore, Tamil Nadu, India

[2] AIT, Bangalore, Karnataka, India

[3] Botho Univ, Gaborone, Botswana

来源：

INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016 | 2016年 / 433卷

关键词：

Big data; Clustering; Hadoop; K-means; Mahout; NoSQL;

D O I：

10.1007/978-81-322-2755-7_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Increase in data volume and need for analytics has led towards innovation of big data. To speed up the query responses models like NoSQL has emerged. Virtualized platforms using commodity hardware and implementing Hadoop on it helps small and midsized companies use cloud environment. This will help organizations to decrease the cost for data processing and analytics. As health care generating volumes and variety of data it is required to build parallel algorithms that can support petabytes of data using hadoop and MapReduce parallel processing. K-means clustering is one of the methods for parallel algorithm. In order to build an accurate system large data sets need to be considered. Memory requirement increases with large data sets and algorithms become slow. Mahout scalable algorithms developed works better with huge data sets and improve the performance of the system. Mahout is an open source and can be used to solve problems arising with huge data sets. This paper proposes cloud based K-means clustering running as a MapReduce job. We use health care data on cloud for clustering. We then compare the results with various measures to conclude the best measure to find number of vectors in a given cluster.

引用

页码：127 / 135

页数：9

共 50 条

[1] Optimized big data K-means clustering using MapReduce
Cui, Xiaoli
Zhu, Pingfei
Yang, Xin
Li, Keqiu
Ji, Changqing
[J]. JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
[2] Optimized big data K-means clustering using MapReduce
Xiaoli Cui
Pingfei Zhu
Xin Yang
Keqiu Li
Changqing Ji
[J]. The Journal of Supercomputing, 2014, 70 : 1249 - 1259
[3] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
[J]. 9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[4] Canopy with k-means Clustering Algorithm for Big Data Analytics
Sagheer, Noor S.
Yousif, Suhad A.
[J]. FOURTH INTERNATIONAL CONFERENCE OF MATHEMATICAL SCIENCES (ICMS 2020), 2021, 2334
[5] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
[J]. Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
[6] Analysis of big data job requirements based on K-means text clustering in China
Debao, Dai
Yinxia, Ma
Min, Zhao
[J]. PLOS ONE, 2021, 16 (08):
[7] A Novel MapReduce Based k-Means Clustering
Sinha, Ankita
Jana, Prasanta K.
[J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION, 2017, 458 : 247 - 255
[8] Parallel K-Means Clustering Based on MapReduce
Zhao, Weizhong
Ma, Huifang
He, Qing
[J]. CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
[9] Big Data Analytics Model for Distributed Document Using Hybrid Optimization with K-Means Clustering
Sharma, Kapil
Saini, Satish
Sharma, Shailja
Kang, Hardeep Singh
Bouye, Mohamed
Krah, Daniel
[J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[10] Improving Clustering Efficiency by SimHash-based K-Means Algorithm for Big Data Analytics
Wang, Jenq-Haur
Lin, Jia-Zhi
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1881 - 1888

← 1 2 3 4 5 →