Finding number of clusters in single-step with similarity-based information-theoretic algorithm

被引:1
|
作者
Temel, T. [1 ]
机构
[1] Bursa Tech Univ, Dept Mechatron Engn, Fac Nat Sci Architecture & Engn, Bursa, Turkey
关键词
computational complexity; entropy; pattern clustering; probability; statistical analysis; single-step algorithm; two-valued function; cluster-boundary indicator; probability descriptions; intercluster boundary; cluster availability; synthetic data sets; real data sets; time complexity; similarity-based information-theoretic sample entropy;
D O I
10.1049/el.2013.3362
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A single-step algorithm is presented to find the number of clusters in a dataset. An almost two-valued function called cluster-boundary indicator is introduced with the use of similarity-based information-theoretic sample entropy and probability descriptions. This function finds inter-cluster boundary samples for cluster availability in a single iteration. Experiments with synthetic and anonymous real datasets show that the new algorithm outperforms its major counterparts statistically in terms of time complexity and the number of clusters found successfully.
引用
收藏
页码:29 / U34
页数:2
相关论文
共 24 条
  • [1] Finding the number of clusters in a dataset: An information-theoretic approach
    Sugar, CA
    James, GM
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (463) : 750 - 763
  • [2] A SINGLE-STEP CLUSTERING ALGORITHM BASED ON A NEW INFORMATION-THEORETIC SAMPLE ASSOCIATION METRIC DEFINITION
    Temel, T.
    [J]. NEURAL NETWORK WORLD, 2017, 27 (05) : 519 - 528
  • [3] Finding the number of clusters in a dataset using an information theoretic hierarchical algorithm
    Aghagolzadeh, M.
    Soltanian-Zadeh, H.
    Araabi, B. N.
    Aghagolzadeh, A.
    [J]. 2006 13TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2006, : 1336 - +
  • [4] Measuring structural similarity of semistructured data based on information-theoretic approaches
    Sven Helmer
    Nikolaus Augsten
    Michael Böhlen
    [J]. The VLDB Journal, 2012, 21 : 677 - 702
  • [5] Measuring structural similarity of semistructured data based on information-theoretic approaches
    Helmer, Sven
    Augsten, Nikolaus
    Boehlen, Michael
    [J]. VLDB JOURNAL, 2012, 21 (05): : 677 - 702
  • [6] Similarity-Based Approaches for Determining the Number of Trace Clusters in Process Discovery
    De Koninck, Pieter
    De Weerdt, Jochen
    [J]. TRANSACTIONS ON PETRI NETS AND OTHER MODELS OF CONCURRENCY XII, 2017, 10470 : 19 - 42
  • [7] Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective
    Sanchez, David
    Batet, Montserrat
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) : 749 - 759
  • [8] An Improved String Similarity Measure Based on Combining Information-Theoretic and Edit Distance Methods
    Thi Thuy Anh Nguyen
    Conrad, Stefan
    [J]. KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, IC3K 2014, 2015, 553 : 228 - 239
  • [9] On the boosting algorithm far multiclass functions based on information-theoretic criterion for approximation
    Takimoto, E
    Maruoka, A
    [J]. DISCOVERY SCIENCE, 1998, 1532 : 256 - 267
  • [10] Spatially distributed target detection based on EM algorithm and information-theoretic criteria
    Li, Tao
    Feng, Da-Zheng
    Xia, Yu-Yin
    [J]. Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2010, 32 (04): : 908 - 912