Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data

被引:0
|
作者
Kamlesh Kumar Pandey
Diwakar Shukla
机构
[1] Dr. Harisingh Gour Vishwavidyalaya,Department of Computer Science and Applications
[2] Dr. Harisingh Gour Vishwavidyalaya,Department of Mathematics and Statistics
来源
Evolving Systems | 2023年 / 14卷
关键词
Big data clustering; Initial centroid algorithm; Genome clustering; Gene expression data clustering; Kurtosis clustering; Systematic sampling; Sorting heuristic; Convergence speed; K-means;
D O I
暂无
中图分类号
学科分类号
摘要
Genome clustering is one of the big data applications that identify the prognosis of terrifying diseases and biological processes across enormous sets of genes. The K-Means (KM) algorithm is the most commonly used clustering algorithm for gene expression data that extracts hidden knowledge, patterns and trends from gene expression profiles for decision-making strategies. Unfortunately, the KM algorithm is extremely sensitive to initial centroid selection since the initial centroid of clusters influences computational effectiveness, efficiency, cost and local optima issues. The existing initial centroid initialization algorithm attains high computational complexity due to extensive iterations, distance computation, data and result comparison on high dimensional data. To overcome these weaknesses, this study suggested the Min–Max Kurtosis Distance (MMKD) algorithm for big data clustering in a single machine environment. The MMKD algorithm resolves the KM clustering weaknesses by the distance between data points of origin and minimum–maximum kurtosis dimension. The performance of the proposed algorithm is compared to KM, KM++ , ADV, MKM, Mean-KM, NFD, K-MAM, NRKM2, FMNN and MuKM algorithms by internal and external effectiveness validation metrics with efficiency measurement on sixteen gene expression datasets. The experimental evaluation demonstrates that the MMKDKM algorithm reduces iterations, local optima, computation costs, and improves cluster performance, effectiveness and efficiency with stable convergence than other algorithms. The statistical analysis of this study promised that the proposed MMKDKM algorithm achieves a significant difference.
引用
收藏
页码:207 / 244
页数:37
相关论文
共 50 条
  • [1] Min max kurtosis distance based improved initial centroid selection approach of K-means clustering for big data mining on gene expression data
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    [J]. EVOLVING SYSTEMS, 2023, 14 (02) : 207 - 244
  • [2] Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Kamlesh Kumar Pandey
    Diwakar Shukla
    [J]. Evolutionary Intelligence, 2023, 16 : 1055 - 1076
  • [3] Min-max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    [J]. EVOLUTIONARY INTELLIGENCE, 2023, 16 (03) : 1055 - 1076
  • [4] Min-max kurtosis stratum mean: An improved K-means cluster initialization approach for microarray gene clustering on multidimensional big data
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (23):
  • [5] A MAX-MIN CLUSTERING METHOD FOR k-MEANS ALGORITHM OF DATA CLUSTERING
    Yuan, Baolan
    Zhang, Wanjun
    Yuan, Yubo
    [J]. JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2012, 8 (03) : 565 - 575
  • [6] Performance Enhancement of K-Means clustering algorithm for gene expression data using entropy-based centroid selection
    Trivedi, Naveen
    Kanungo, Suvendu
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 143 - 148
  • [7] Improved k-means clustering based on Efros distance for longitudinal data
    Sun, Yanhui
    Fang, Liying
    Wang, Pu
    [J]. PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 3853 - 3856
  • [8] NDPD: an improved initial centroid method of partitional clustering for big data mining
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    [J]. JOURNAL OF ADVANCES IN MANAGEMENT RESEARCH, 2023, 20 (01) : 1 - 34
  • [9] Research on parallel association rule mining of big data based on an improved K-means clustering algorithm
    Hao, Li
    Wang, Tuanbu
    Guo, Chaoping
    [J]. INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2023, 16 (03) : 233 - 247
  • [10] A Median based External Initial Centroid Selection Method for K-means Clustering
    SampathPremkumar, M.
    Ganesh, S. Hari
    [J]. 2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 143 - 146