Kernelized Spectral Clustering based Conditional MapReduce function with big data

被引:1
|
作者
Maheswari K. [1 ]
Ramakrishnan M. [2 ]
机构
[1] Department of Computer Science, Bharathiar University, Coimbatore
[2] School of Information Technology, Madurai Kamaraj University, Madurai
关键词
Big data analytics; clustering; Conditional Maximum Entropy MapReduce; dimensionality reduction; irrelevant data; Kernelized Spectral Clustering;
D O I
10.1080/1206212X.2019.1587892
中图分类号
学科分类号
摘要
Clustering is the significant data mining technique for big data analysis, where large volume data are grouped. The resulting of clustering is to minimize the dimensionality while accessing large volume of data. The several data mining techniques have been developed for clustering the data. But the problem of clustering becomes increasing rapidly in recent years since the existing clustering algorithm failed to minimize the clustering time and majority of techniques require huge memory to perform clustering task. In order to improve clustering accuracy and minimize the dimensionality, a Kernelized Spectral Clustering based Conditional Maximum Entropy MapReduce (KSC-CMEMR) technique is introduced. The number of data is collected from big dataset. The KSC-CMEMR technique partitions the data into different clusters using Kernelized Spectral Clustering Process based on the spectrum of similarity matrix and to perform dimensionality reduction. Based on the similarity, the Kernelized Spectral Clustering is carried out with higher clustering accuracy. After that, Conditional Maximum Entropy MapReduce model eliminates the irrelevant data present in the cluster. The designed model predicts the maximum probabilities of data become a member of the cluster and remove the irrelevant data from the cluster. This helps to reduce the false positive and space complexity. Experimental evaluation is carried out with certain parameters such as clustering accuracy, clustering time, false positive rate, and space complexity with respect to the number of data. The experimental results reported that the proposed KSC-CMEMR technique obtains high clustering accuracy with minimum time as well as space complexity. © 2019 Informa UK Limited, trading as Taylor & Francis Group.
引用
收藏
页码:601 / 611
页数:10
相关论文
共 50 条
  • [41] Challenges for MapReduce in Big Data
    Grolinger, Katarina
    Hayes, Michael
    Higashino, Wilson A.
    L'Heureux, Alexandra
    Allison, David S.
    Capretz, Miriam A. M.
    2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, : 182 - 189
  • [42] Random Partition Based Adaptive Distributed Kernelized SVM for Big Data
    Pal, Amrit
    Chowdhury, Abishi
    Satakshi
    Narman, Husnu S.
    Chowdhury, Arkabandhu
    Kumar, Manish
    IEEE ACCESS, 2022, 10 : 95623 - 95637
  • [43] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [44] MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm
    Kulkarni, Omkaresh
    Jena, Sudarson
    Ravi Sankar, V.
    IET IMAGE PROCESSING, 2020, 14 (12) : 2719 - 2727
  • [45] Research on spectral clustering algorithm for network communication big data based on wavelet analysis
    Dai, Xinjian
    Zeng, Zhichao
    INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2022, 15 (02) : 93 - 105
  • [46] MapReduce based Classification for Fault Detection in Big Data Applications
    Shafiq, M. Omair
    Fekri, Maryam
    Ibrahim, Rami
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 637 - 642
  • [47] A MapReduce-based Fuzzy Associative Classifier for Big Data
    Ducange, Pietro
    Marcelloni, Francesco
    Segatori, Armando
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [48] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [49] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
  • [50] Research on large data set clustering method based on MapReduce
    Wei, Pengcheng
    He, Fangcheng
    Li, Li
    Shang, Chuanfu
    Li, Jing
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (01): : 93 - 99