Clustering on Big Data Using Hadoop MapReduce

被引:3
|
作者
Akthar, Nadeem [1 ]
Ahamad, Mohd Vasim [1 ]
Khan, Shahbaz [1 ]
机构
[1] Aligarh Muslim Univ, ZHCET, Dept Comp Engn, Aligarh 202002, Uttar Pradesh, India
关键词
Clustering; Big Data; K-Means Clustering; Hadoop; MapReduce; Data Mining;
D O I
10.1109/CICN.2015.161
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
with the phenomenal increase in digital data, it is inefficient to run the traditional clustering algorithms on separate servers. To deal with this problem, researchers are migrating to distribute environment to implement the traditional clustering algorithms, more specifically K-means clustering. In traditional K Means Clustering, the problem of instability caused by the random initial centers exists. With random initial centres, if we execute the clustering algorithm (More specifically K-Means) on the same data set, more than once, we get different cluster results each time. Thus making the results unstable. Here, we proposed a modified K-Means clustering algorithm, which take the optimized initial centres based on data dimensional density. This approach deal with the random initial centers taken for algorithm execution and provides stable cluster results.
引用
收藏
页码:789 / 795
页数:7
相关论文
共 50 条
  • [21] Solving Mean-Shift Clustering Using MapReduce Hadoop
    Kalimoldayev, Maksat N.
    Siladi, Vladimir
    Satymbekov, Maksat N.
    Naizabayeva, Lyazat
    [J]. 2017 IEEE 14TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS, 2017, : 164 - 167
  • [22] MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Ahmad, Shahbaaz
    [J]. 2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 192 - 198
  • [23] A Hadoop/MapReduce based platform for supporting health big data analytics
    Kuo, Alex
    Chrimes, Dillon
    Qin, Pinle
    Zamani, Hamid
    [J]. Studies in Health Technology and Informatics, 2019, 257 : 229 - 235
  • [24] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [25] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
  • [26] Big Data Management Processing with Hadoop MapReduce and Spark Technology: A Comparison
    Verma, Ankush
    Mansuri, Ashik Hussain
    Jain, Neelesh
    [J]. 2016 SYMPOSIUM ON COLOSSAL DATA ANALYSIS AND NETWORKING (CDAN), 2016,
  • [27] A Comparison of Big Remote Sensing Data Processing with Hadoop MapReduce and Spark
    Chebbi, I.
    Boulila, W.
    Mellouli, N.
    Lamolle, M.
    Farah, I. R.
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2018,
  • [28] A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
    Pal, Amrit
    Agrawal, Pinki
    Jain, Kunal
    Agrawal, Sanjay
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 587 - 591
  • [29] Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
    Ansari Z.
    Afzal A.
    Sardar T.H.
    [J]. Journal of The Institution of Engineers (India): Series B, 2019, 100 (2) : 95 - 103
  • [30] MapReduce based Method for Big Data Semantic Clustering
    Yang, Jie
    Li, Xiaoping
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2814 - 2819