Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification

被引：143

作者：

Sun, Lin ^{[1
,3
,4
]}

Wang, Tianxiang ^{[1
]}

Ding, Weiping ^{[2
]}

Xu, Jiucheng ^{[1
,4
]}

Lin, Yaojin ^{[3
]}

机构：

[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China

[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

[3] Minnan Normal Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Peoples R China

[4] Key Lab Artificial Intelligence & Personalized Le, Xinxiang 453007, Henan, Peoples R China

来源：

INFORMATION SCIENCES | 2021年 / 578卷

基金：

中国国家自然科学基金;

关键词：

Feature selection; Neighborhood rough sets; Fisher Score; Multilabel classification; LABEL FEATURE-SELECTION; UNCERTAINTY MEASURES; INFORMATION;

D O I：

10.1016/j.ins.2021.08.032

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, feature selection for multilabel classification has attracted attention in machine learning and data mining. However, some feature selection methods ignore the correlations among labels, resulting in low performance, and most of them face challenges in determining an appropriate neighborhood radius for neighborhood systems and suffer from expensive time cost. To overcome the issues, we propose a novel feature selection method using Fisher score and multilabel neighborhood rough sets (MNRS) in multilabel neighborhood decision systems. First, to identify the correlations between labels under a binary distribution, two types of new mutual information between labels are considered, and their balance coefficients are defined. By enhancing strong correlations and weakening weak correlations between labels, a mutual information-based Fisher score model with a second-order correlation between labels is designed to fit multilabel data. Second, to address the problem of automatically choosing a neighborhood radius, a subset of hetero-geneous and homogeneous samples is employed to develop a new classification margin as a neighborhood radius, and some concepts of neighborhood, neighborhood class, and upper and lower approximations are formulated for multilabel neighborhood decision systems. The weight and dependency degree are presented to effectively measure the uncertainty of samples in multilabel neighborhood decision systems. Thus, we further present a new classification margin-based MNRS model. Finally, a filter-wrapper preprocessing algorithm for feature selection using the improved Fisher score model is proposed to decrease the spatiotemporal complexity of multilabel data, and a heuristic feature selection algorithm is designed for improve classification performance on multilabel datasets. Experimental results on thirteen multilabel datasets show that the proposed algorithm is effective in selecting significant features, demonstrating its excellent classification ability in multilabel datasets. (c) 2021 Elsevier Inc. All rights reserved.

引用

页码：887 / 912

页数：26

共 50 条

[21] Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification
Jiucheng Xu
Kanglin Qu
Kangjian Qu
Qincheng Hou
Xiangru Meng
International Journal of Machine Learning and Cybernetics, 2023, 14 : 4011 - 4028
[22] Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification
Xu, Jiucheng
Qu, Kanglin
Qu, Kangjian
Hou, Qincheng
Meng, Xiangru
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) : 4011 - 4028
[23] Memetic multilabel feature selection using pruned refinement process
Seo, Wangduk
Park, Jaegyun
Lee, Sanghyuck
Moon, A-Seong
Kim, Dae-Won
Lee, Jaesung
JOURNAL OF BIG DATA, 2024, 11 (01)
[24] Granule-specific feature selection for continuous data classification using neighborhood rough sets
Sewwandi, Mahawaga Arachchige Nayomi Dulanjala
Li, Yuefeng
Zhang, Jinglan
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[25] Feature Selection via Label Enhancement and Weighted Neighborhood Mutual Information for Multilabel Data
Sun, Lin
Guo, Jiaqi
Wu, Xuejiao
Xu, Jiucheng
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 470 - 480
[26] Cost-constrained feature selection in multilabel classification using an information-theoretic approach
Klonecki, Tomasz
Teisseyre, Pawel
Lee, Jaesung
PATTERN RECOGNITION, 2023, 141
[27] Sparse Representation: Extract Adaptive Neighborhood for Multilabel Classification
Xiang, Shuo
Chen, Songcan
Qiao, Lishan
PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 304 - 314
[28] Joint multilabel classification and feature selection based on deep canonical correlation analysis
Dai, Liang
Du, Guodong
Zhang, Jia
Li, Candong
Wei, Rong
Li, Shaozi
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (22):
[29] Feature selection for multi-label classification based on neighborhood rough sets
Duan, Jie
Hu, Qinghua
Zhang, Lingjun
Qian, Yuhua
Li, Deyu
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (01): : 56 - 65
[30] Feature selection for blind image steganalysis using neighborhood rough sets
Chen, Yingyue
Chen, Yumin
Yin, Aimin
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (03) : 3709 - 3720

← 1 2 3 4 5 →