A PARTITION-BASED FEATURE SELECTION METHOD FOR MIXED DATA: A FILTER APPROACH

被引:0
|
作者
Dutt, Ashish [1 ]
Ismail, Maizatul Akmar [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
关键词
Clustering; educational data mining; mixed data; unsupervised feature selection; GENERAL COEFFICIENT; SIMILARITY;
D O I
10.22452/mjcs.vol33no2.5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is fundamentally an optimization problem for selecting relevant features from several alternatives in clustering problems. Though several algorithms have been suggested, however till this day, there has not been any one of those that has been dubbed as the best for every problem scenario. Therefore, researchers continue to strive in developing superior algorithms. Even though clustering process is considered a pre-processing task but what it really does is just dividing the data into groups. In this paper we have attempted an improved distance function to cluster mixed data. A similarity measure for mixed data is Gower distance is adopted and modified to define the similarity between object pairs. A partitional algorithm for mixed data is employed to group similar objects in clusters. The performance of the proposed method has been evaluated on similar mixed and real educational dataset in terms of the silhouette coefficient. Results reveal the effectiveness of this algorithm in unsupervised discovery problems. The proposed algorithm performed better than other clustering algorithms for various datasets.
引用
收藏
页码:152 / 169
页数:18
相关论文
共 50 条
  • [1] Partition-based selection
    Mason, JS
    Pickett, SD
    [J]. PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 1997, 7-8 : 85 - 114
  • [2] A new Unsupervised Spectral Feature Selection Method for mixed data: A filter approach
    Solorio-Fernandez, Saul
    Fco Martinez-Trinidad, Jose
    Ariel Carrasco-Ochoa, J.
    [J]. PATTERN RECOGNITION, 2017, 72 : 314 - 326
  • [3] A Supervised Filter Feature Selection Method for Mixed Data Based on the Spectral Gap Score
    Solorio-Fernandez, Saul
    Fco Martinez-Trinidad, Jose
    Ariel Carrasco-Ochoa, Jesus
    [J]. PATTERN RECOGNITION, MCPR 2019, 2019, 11524 : 3 - 13
  • [4] Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measure
    Solorio-Fernandez, Saul
    Carrasco-Ochoa, J. Ariel
    Martinez-Trinidad, Jose Fco.
    [J]. NEUROCOMPUTING, 2024, 571
  • [5] Partition-based feature screening for categorical data via RKHS embeddings
    Lu, Jun
    Lin, Lu
    Wang, WenWu
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 157
  • [6] Fuzzy rough dimensionality reduction: A feature set partition-based approach
    Wang, Zhihong
    Chen, Hongmei
    Yang, Xiaoling
    Wan, Jihong
    Li, Tianrui
    Luo, Chuan
    [J]. INFORMATION SCIENCES, 2023, 644
  • [7] A Supervised Filter Feature Selection method for mixed data based on Spectral Feature Selection and Information-theory redundancy analysis
    Solorio-Fernandez, Saul
    Fco Martinez-Trinidad, Jose
    Ariel Carrasco-Ochoa, J.
    [J]. PATTERN RECOGNITION LETTERS, 2020, 138 : 321 - 328
  • [8] Binning schemes for partition-based compound selection
    Bayley, MJ
    Willett, P
    [J]. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 1999, 17 (01): : 10 - 18
  • [9] A practical partition-based approach for ontology version
    Wang, ZJ
    Zhang, SS
    Wang, YL
    Du, T
    [J]. CURRENT TRENDS IN HIGH PERFORMANCE COMPUTING AND ITS APPLICATIONS, PROCEEDINGS, 2005, : 495 - 499
  • [10] A Partition-Based Approach to Structure Similarity Search
    Zhao, Xiang
    Xiao, Chuan
    Lin, Xuemin
    Liu, Qing
    Zhang, Wenjie
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (03): : 169 - 180