Clustering item data sets with association-taxonomy similarity

被引:1
|
作者
Yun, CH [1 ]
Chuang, KT [1 ]
Chen, MS [1 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Grad Inst Commun Engn, Taipei, Taiwan
关键词
D O I
10.1109/ICDM.2003.1251011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore in this paper the efficient clustering of item data. Different from those of the traditional data, the features of item data are known to be of high dimensionality and sparsity. In view of the features of item data, we devise in this paper a novel measurement, called the association-taxonomy similarity, and utilize this measurement to perform the clustering. With this association-taxonomy similarity measurement, we develop an efficient clustering algorithm, called algorithm AT (standing for Association-Taxonomy), for item data. Two validation indexes based on association and taxonomy properties are also devised to assess the quality of clustering for item data. As validated by the real dataset, it is shown by our experimental results that algorithm AT devised in this paper significantly outperforms the prior works in the clustering quality as measured by the validation indexes, indicating the usefulness of association-taxonomy similarity in item data clustering.
引用
收藏
页码:697 / 700
页数:4
相关论文
共 50 条
  • [21] Categorical Data Clustering: A Bibliometric Analysis and Taxonomy
    Cendana, Maya
    Kuo, Ren-Jieh
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (02): : 1009 - 1054
  • [22] Patch clustering for massive data sets
    Alex, Nikolai
    Hasenfuss, Alexander
    Hammer, Barbara
    NEUROCOMPUTING, 2009, 72 (7-9) : 1455 - 1469
  • [23] Efficient clustering of large data sets
    Ananthanarayana, VS
    Murty, MN
    Subramanian, DK
    PATTERN RECOGNITION, 2001, 34 (12) : 2561 - 2563
  • [24] Extended many-item similarity indices for sets of nucleotide and protein sequences
    Bajusz, David
    Miranda-Quintana, Ramon Alain
    Racz, Anita
    Heberger, Karoly
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 (19): : 3628 - 3639
  • [25] An Item-Item Similarity Approach based on Linked Open Data Semantic Relationship
    Pereira, Italo M.
    Ferreira, Anderson A.
    WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 425 - 432
  • [26] Comparing two dissolution data sets for similarity
    Tsong, Y
    Sathe, P
    Shah, VP
    AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE BIOPHARMACEUTICAL SECTION, 1996, : 129 - 134
  • [27] NORMATIVE DATA FOR 2 MMPI CRITICAL ITEM SETS
    EVANS, RG
    JOURNAL OF CLINICAL PSYCHOLOGY, 1984, 40 (02) : 512 - 515
  • [28] SIMILARITY SCREENING OF MOLECULAR-DATA SETS
    GOOD, AC
    HODGKIN, EE
    RICHARDS, WG
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1992, 6 (05) : 513 - 520
  • [29] A similarity computing algorithm for volumetric data sets
    Zhang, T
    Chen, W
    Hu, M
    Peng, QS
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 742 - 751
  • [30] Using Conditional Association to Identify Locally Independent Item Sets
    Straat, J. Hendrik
    van der Ark, L. Andries
    Sijtsma, Klaas
    METHODOLOGY-EUROPEAN JOURNAL OF RESEARCH METHODS FOR THE BEHAVIORAL AND SOCIAL SCIENCES, 2016, 12 (04) : 117 - 123