Classification by clustering using an extended saliency measure

被引:2
|
作者
Barak, A. [1 ]
Gelbard, R. [1 ]
机构
[1] Bar Ilan Univ, Grad Sch Business Adm, Informat Syst Program, IL-52900 Ramat Gan, Israel
关键词
data mining; cluster analysis; classification; decision trees; bounded-rationality; saliency; classification by clustering (CBC); DESIGN SCIENCE; DECISION; REPRESENTATION; METHODOLOGY; SIMILARITY; SYSTEM; MODEL;
D O I
10.1111/exsy.12121
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many data mining tasks, the goal is to classify entities into a set of pre-defined groups (classes). A second and equally important goal is the interpretation, i.e. understanding the nature of the population aggregated in each class. These tasks are rendered even more complex when there is no a-priori information regarding the right classification. The current paper is based on two concepts: (1) Bounded-Rationality theory which implements an S-shaped function that represents human logic as a saliency measure to determine the substantial features that characterize each potential group and (2) Classification by clustering (CBC) that applies Decision Tree-like classification in unsupervised clustering problems, where neither an a-priori classification nor target-attributes are known in advance. In the context of these two concepts, the current research contributes: (1) by expanding the saliency measure to all possible types of variables (nominal as well as numerical), (2) by evaluating, using five datasets, a composite model that combines the CBC method and the saliency concept. The findings show that by using clustering algorithms for classification tasks (CBC method) the results are as accurate as those obtained by conventional Decision Trees, but with a better saliency factor.
引用
收藏
页码:46 / 59
页数:14
相关论文
共 50 条
  • [1] Clustering Categorical Data Using an Extended Modularity Measure
    Labiod, Lazhar
    Grozavu, Nistor
    Bennani, Younes
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 310 - 320
  • [2] Software component clustering and classification using novel similarity measure
    Srinivas, Chintakindi
    Radhakrishna, Vangipuram
    Rao, C. V. Guru
    8TH INTERNATIONAL CONFERENCE INTERDISCIPLINARITY IN ENGINEERING, INTER-ENG 2014, 2015, 19 : 866 - 873
  • [3] A Similarity Measure for Text Classification and Clustering
    Lin, Yung-Shen
    Jiang, Jung-Yi
    Lee, Shie-Jue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1575 - 1590
  • [4] Extended Box Clustering for Classification Problems
    Vincenzo Spinelli
    Journal of Classification, 2018, 35 : 100 - 123
  • [5] Machine Learning for Image Classification and Clustering Using a Universal Distance Measure
    Chester, Uzi
    Ratsaby, Joel
    SIMILARITY SEARCH AND APPLICATIONS (SISAP), 2013, 8199 : 59 - 72
  • [6] Extended Box Clustering for Classification Problems
    Spinelli, Vincenzo
    JOURNAL OF CLASSIFICATION, 2018, 35 (01) : 100 - 123
  • [7] Unsupervised multistage image classification using hierarchical clustering with a Bayesian similarity measure
    Lee, S
    Crawford, MM
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2005, 14 (03) : 312 - 320
  • [8] A Comment on "A Similarity Measure for Text Classification and Clustering"
    Nagwani, Naresh Kumar
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2589 - 2590
  • [9] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [10] Clustering ensemble selection based on the extended Jaccard measure
    Khalili, Hajar
    Rabbani, Mohsen
    Akbari, Ebrahim
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 (04) : 2215 - 2231