An extended visual methods to perform data cluster assessment in distributed data systems

被引:0
|
作者
K. Subba Reddy
K. Rajendra Prasad
Govardhan Reddy Kamatam
N. Ramanjaneya Reddy
机构
[1] Rajeev Gandhi Memorial College of Engineering and Technology,Department of CSE
[2] G. Pulla Reddy Engineering College,Department of CSE
[3] KSRM College of Engineering,Department of CSE
来源
关键词
Big data; Cluster; LSL; VAT;
D O I
暂无
中图分类号
学科分类号
摘要
The cluster tendency is one of the major problems in data clustering. Deriving the number of clusters for an unlabeled dataset is known as the cluster tendency problem. In this paper, the preclustering problem for important clustering methods, such as k-means, hierarchical clustering, etc., is considered. Existing preclustering methods, i.e., the visual assessment tendency (VAT), effectively solve the cluster tendency (i.e., k in the k-means). Enhanced methods, such as the improved VAT (iVAT) and other related visual methods, have greatly succeeded in determining the precluster tendency for complex and large datasets. Clustering using the improved visual assessment tendency (ClusiVAT) is a recent visual method and is widely used for large datasets. However, it focuses primarily on the amount of data rather than the dimensionality. Big data in real-time applications possess large sizes and higher dimensions. The ClusiVAT uses the sampling technique to handle the amount of original data; however, it is not focused on high-dimensional big data. Thus, the proposed method develops scalable visual methods using linear subspace learning (LSL) techniques to overcome the curse of dimensionality. Empirical analysis is performed to demonstrate the efficiency of the proposed LSL-based visual methods using benchmarked datasets.
引用
收藏
页码:8810 / 8829
页数:19
相关论文
共 50 条
  • [1] An extended visual methods to perform data cluster assessment in distributed data systems
    Reddy, K. Subba
    Prasad, K. Rajendra
    Kamatam, Govardhan Reddy
    Reddy, N. Ramanjaneya
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (06): : 8810 - 8829
  • [2] Detecting data anomalies methods in distributed systems
    Mosiej, Lukasz
    [J]. PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2009, 2009, 7502
  • [3] Data resource selection in distributed visual information systems
    Chang, W
    Sheikholeslami, G
    Wang, J
    Zhang, A
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (06) : 926 - 946
  • [4] A Visual Analytics Framework for Distributed Data Analysis Systems
    Nayeem, Abdullah-Al-Raihan
    Elshambakey, Mohammed
    Dobbs, Todd
    Lee, Huikyo
    Crichton, Daniel
    Zhu, Yimin
    Chokwitthaya, Chanachok
    Tolone, William J.
    Cho, Isaac
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 229 - 240
  • [5] bigVAT: Visual assessment of cluster tendency for large data sets
    Huband, JM
    Bezdek, JC
    Hathaway, RJ
    [J]. PATTERN RECOGNITION, 2005, 38 (11) : 1875 - 1886
  • [6] Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning
    Wang, Liang
    Geng, Xin
    Bezdek, James
    Leckie, Christopher
    Ramamohanarao, Kotagiri
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) : 1401 - 1414
  • [7] Scalable visual assessment of cluster tendency for large data sets
    Hathaway, Richard J.
    Bezdek, James C.
    Huband, Jacalyn M.
    [J]. PATTERN RECOGNITION, 2006, 39 (07) : 1315 - 1324
  • [8] Graph based visual assessment cluster tendency for unlabeled data sets
    [J]. Prabhu, P. (spunitha156@yahoo.co.in), 1957, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):
  • [9] Evaluation of Data Enrichment Methods for Distributed Stream Processing Systems
    Scheinert, Dominik
    Casares, Fabian
    Geldenhuys, Morgan K.
    Styp-Rekowski, Kevin
    Kao, Odej
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E, 2023, : 202 - 211
  • [10] Consensus Methods for Solving Inconsistency of Replicated Data in Distributed Systems
    Czesław Danilowicz
    Ngoc Thanh Nguyen
    [J]. Distributed and Parallel Databases, 2003, 14 : 53 - 69