A Two-Phase Approach for Semi-Supervised Feature Selection

被引:0
|
作者
Saxena, Amit [1 ]
Pare, Shreya [2 ]
Meena, Mahendra Singh [2 ]
Gupta, Deepak [3 ]
Gupta, Akshansh [4 ]
Razzak, Imran [5 ]
Lin, Chin-Teng [2 ]
Prasad, Mukesh [2 ]
机构
[1] Guru Ghasidas Univ, Dept Comp Sci & Informat Technol, Bilaspur 495009, Chhattisgarh, India
[2] Univ Technol Sydney, Sch Comp Sci, FEIT, Sydney, NSW 2007, Australia
[3] Natl Inst Technol Arunachal Pradesh, Dept Comp Sci & Engn, Yupia 791112, India
[4] Cent Elect Engn Res Inst, Delhi 110028, India
[5] Deakin Univ, Sch Informat Technol, Geeloing, Vic 3217, Australia
基金
澳大利亚研究理事会;
关键词
feature selection; semi-supervised datasets; classification; clustering; correlation; RECOGNITION;
D O I
10.3390/a13090215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel approach for selecting a subset of features in semi-supervised datasets where only some of the patterns are labeled. The whole process is completed in two phases. In the first phase, i.e., Phase-I, the whole dataset is divided into two parts: The first part, which contains labeled patterns, and the second part, which contains unlabeled patterns. In the first part, a small number of features are identified using well-known maximum relevance (from first part) and minimum redundancy (whole dataset) based feature selection approaches using the correlation coefficient. The subset of features from the identified set of features, which produces a high classification accuracy using any supervised classifier from labeled patterns, is selected for later processing. In the second phase, i.e., Phase-II, the patterns belonging to the first and second part are clustered separately into the available number of classes of the dataset. In the clusters of the first part, take the majority of patterns belonging to a cluster as the class for that cluster, which is given already. Form the pairs of cluster centroids made in the first and second part. The centroid of the second part nearest to a centroid of the first part will be paired. As the class of the first centroid is known, the same class can be assigned to the centroid of the cluster of the second part, which is unknown. The actual class of the patterns if known for the second part of the dataset can be used to test the classification accuracy of patterns in the second part. The proposed two-phase approach performs well in terms of classification accuracy and number of features selected on the given benchmarked datasets.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Two-Dimensional Semi-Supervised Feature Selection
    Li, Junyu
    Liang, Xin
    Li, Peijie
    Mang, Weile
    Du, Qintao
    Yuan, Haoliang
    [J]. 2020 10TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2020, : 280 - 287
  • [2] Weighting Based Approach for Semi-supervised Feature Selection
    Benabdeslem, Khalid
    Hindawi, Mohammed
    Makkhongkaew, Raywat
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV, 2015, 9492 : 300 - 307
  • [3] Forward semi-supervised feature selection
    Ren, Jiangtao
    Qiu, Zhengyuan
    Fan, Wei
    Cheng, Hong
    Yu, Philip S.
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 970 - +
  • [4] Joint Representative Selection and Feature Learning: A Semi-Supervised Approach
    Wang, Suchen
    Meng, Jingjing
    Yuan, Junsong
    Tan, Yap-Peng
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5998 - 6006
  • [5] A two-phase hybrid of semi-supervised and active learning approach for sequence labeling
    Hassanzadeh, Hamed
    Keyvanpour, Mohammadreza
    [J]. INTELLIGENT DATA ANALYSIS, 2013, 17 (02) : 251 - 270
  • [6] Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection
    Ang, Jun Chin
    Mirzal, Andri
    Haron, Habibollah
    Hamed, Haza Nuzly Abdull
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) : 971 - 989
  • [7] A Survey on semi-supervised feature selection methods
    Sheikhpour, Razieh
    Sarram, Mehdi Agha
    Gharaghani, Sajjad
    Chahooki, Mohammad Ali Zare
    [J]. PATTERN RECOGNITION, 2017, 64 : 141 - 158
  • [8] Semi-supervised relevance index for feature selection
    Frederico Coelho
    Cristiano Castro
    Antônio P. Braga
    Michel Verleysen
    [J]. Neural Computing and Applications, 2019, 31 : 989 - 997
  • [9] Simple strategies for semi-supervised feature selection
    Konstantinos Sechidis
    Gavin Brown
    [J]. Machine Learning, 2018, 107 : 357 - 395
  • [10] Joint Semi-Supervised Feature Selection and Classification through Bayesian Approach
    Jiang, Bingbing
    Wu, Xingyu
    Yu, Kui
    Chen, Huanhuan
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3983 - 3990