Kernelized Spectral Clustering based Conditional MapReduce function with big data

被引:1
|
作者
Maheswari K. [1 ]
Ramakrishnan M. [2 ]
机构
[1] Department of Computer Science, Bharathiar University, Coimbatore
[2] School of Information Technology, Madurai Kamaraj University, Madurai
关键词
Big data analytics; clustering; Conditional Maximum Entropy MapReduce; dimensionality reduction; irrelevant data; Kernelized Spectral Clustering;
D O I
10.1080/1206212X.2019.1587892
中图分类号
学科分类号
摘要
Clustering is the significant data mining technique for big data analysis, where large volume data are grouped. The resulting of clustering is to minimize the dimensionality while accessing large volume of data. The several data mining techniques have been developed for clustering the data. But the problem of clustering becomes increasing rapidly in recent years since the existing clustering algorithm failed to minimize the clustering time and majority of techniques require huge memory to perform clustering task. In order to improve clustering accuracy and minimize the dimensionality, a Kernelized Spectral Clustering based Conditional Maximum Entropy MapReduce (KSC-CMEMR) technique is introduced. The number of data is collected from big dataset. The KSC-CMEMR technique partitions the data into different clusters using Kernelized Spectral Clustering Process based on the spectrum of similarity matrix and to perform dimensionality reduction. Based on the similarity, the Kernelized Spectral Clustering is carried out with higher clustering accuracy. After that, Conditional Maximum Entropy MapReduce model eliminates the irrelevant data present in the cluster. The designed model predicts the maximum probabilities of data become a member of the cluster and remove the irrelevant data from the cluster. This helps to reduce the false positive and space complexity. Experimental evaluation is carried out with certain parameters such as clustering accuracy, clustering time, false positive rate, and space complexity with respect to the number of data. The experimental results reported that the proposed KSC-CMEMR technique obtains high clustering accuracy with minimum time as well as space complexity. © 2019 Informa UK Limited, trading as Taylor & Francis Group.
引用
收藏
页码:601 / 611
页数:10
相关论文
共 50 条
  • [21] A kernelized spectral clustering method based on local affinity preserving indexing for document clustering
    1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (07):
  • [22] EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce
    Bohlouli, Mahdi
    He, Zhonghua
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 26 - 34
  • [23] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [24] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [25] Hierarchical PSO Clustering on MapReduce for Scalable Privacy Preservation in Big Data
    Wai, Ei Nyein Chan
    Tsai, Pei-Wei
    Pan, Jeng-Shyang
    GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 36 - 44
  • [26] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [27] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [28] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [29] Density-based Algorithms for Big Data Clustering Using MapReduce Framework: A Comprehensive Study
    Khader, Mariam
    Al-Naymat, Ghazi
    ACM COMPUTING SURVEYS, 2020, 53 (05)
  • [30] Student Psychology based optimized routing algorithm for big data clustering in IoT with MapReduce framework
    Shanmugam, Gowri
    Thanarajan, Tamilvizhi
    Rajendran, Surendran
    Murugaraj, Sadish Sendil
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2051 - 2063