A PARTITION-BASED FEATURE SELECTION METHOD FOR MIXED DATA: A FILTER APPROACH

被引:0
|
作者
Dutt, Ashish [1 ]
Ismail, Maizatul Akmar [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
关键词
Clustering; educational data mining; mixed data; unsupervised feature selection; GENERAL COEFFICIENT; SIMILARITY;
D O I
10.22452/mjcs.vol33no2.5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is fundamentally an optimization problem for selecting relevant features from several alternatives in clustering problems. Though several algorithms have been suggested, however till this day, there has not been any one of those that has been dubbed as the best for every problem scenario. Therefore, researchers continue to strive in developing superior algorithms. Even though clustering process is considered a pre-processing task but what it really does is just dividing the data into groups. In this paper we have attempted an improved distance function to cluster mixed data. A similarity measure for mixed data is Gower distance is adopted and modified to define the similarity between object pairs. A partitional algorithm for mixed data is employed to group similar objects in clusters. The performance of the proposed method has been evaluated on similar mixed and real educational dataset in terms of the silhouette coefficient. Results reveal the effectiveness of this algorithm in unsupervised discovery problems. The proposed algorithm performed better than other clustering algorithms for various datasets.
引用
收藏
页码:152 / 169
页数:18
相关论文
共 50 条
  • [31] Partition-based approach to processing batches of frequent itemset queries
    Grudzinski, Przemyslaw
    Wojciechowski, Marek
    Zakrzewicz, Maciej
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2006, 4027 : 479 - 488
  • [32] Partition-based workload scheduling in living data warehouse environments
    Thiele, Maik
    Fischer, Ulrike
    Lehner, Wolfgang
    [J]. INFORMATION SYSTEMS, 2009, 34 (4-5) : 382 - 399
  • [33] A partition-based approach towards constructing Galois (concept) lattices
    Valtchev, P
    Missaoui, R
    Lebrun, P
    [J]. DISCRETE MATHEMATICS, 2002, 256 (03) : 801 - 829
  • [34] A filter-based feature selection approach in multilabel classification
    Shaikh, Rafia
    Rafi, Muhammad
    Mahoto, Naeem Ahmed
    Sulaiman, Adel
    Shaikh, Asadullah
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [35] A Hybrid Filter/Wrapper Approach of Feature Selection for Gene Expression Data
    Ke, Chao-Hsuan
    Yang, Cheng-Hong
    Chuang, Li-Yeh
    Yang, Cheng-San
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 2663 - +
  • [36] Partition-based approach to parametric dynamic compact thermal modeling
    Codecasa, Lorenzo
    d'Alessandro, Vincenzo
    Magnani, Alessandro
    Rinaldi, Niccolo
    Metzger, Andre G.
    Bornoff, Robin
    Parry, John
    [J]. MICROELECTRONICS RELIABILITY, 2017, 79 : 361 - 370
  • [37] RISC: A new filter approach for feature selection from proteomic data
    Vu, Trung-Nghia
    Ohn, Syng-Yup
    Kim, Chul-Woo
    [J]. MEDICAL BIOMETRICS, PROCEEDINGS, 2007, 4901 : 17 - +
  • [38] A BINARY PARTITION-BASED MATCHING ALGORITHM FOR DATA DISTRIBUTION MANAGEMENT
    Ahn, Junghyun
    Sung, Changho
    Kim, Tag Gon
    [J]. PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 2723 - 2734
  • [39] Feature Selection in Big Data using Filter Based Techniques
    Srinivas, Sumitra K.
    Kancharla, Gangadhara Rao
    [J]. 2019 4TH MEC INTERNATIONAL CONFERENCE ON BIG DATA AND SMART CITY (ICBDSC), 2019, : 139 - 145
  • [40] Pass-Join: A Partition-based Method for Similarity Joins
    Li, Guoliang
    Deng, Dong
    Wang, Jiannan
    Feng, Jianhua
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (03): : 253 - 264