Hierarchical Feature Selection Based on Label Distribution Learning

被引:34
|
作者
Lin, Yaojin [1 ]
Liu, Haoyang [1 ]
Zhao, Hong [1 ]
Hu, Qinghua [2 ]
Zhu, Xingquan [3 ]
Wu, Xindong [4 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[2] Tianjin Univ, Sch Comp Sci, Tianjin 300354, Peoples R China
[3] Florida Atlantic Univ, Dept Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
[4] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230009, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Task analysis; Correlation; Electronic mail; Training; Dinosaurs; Computer science; Common and label-specific features; feature selection; hierarchical classification; label distribution learning; label enhancement; CLASSIFICATION;
D O I
10.1109/TKDE.2022.3177246
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical classification learning, which organizes data categories into a hierarchical structure, is an effective approach for large-scale classification tasks. The high dimensionality of data feature space, represented in hierarchical class structures, is one of the main research challenges. In addition, the class hierarchy often introduces imbalanced class distributions and causes overfitting. In this paper, we propose a feature selection method based on label distribution learning to address the above challenges. The crux is to alleviate the class imbalance problem and learn a discriminative feature subset for hierarchical classification process. Due to correlation between different class categories in the hierarchical tree structure, sibling categories can provide additional supervisory information for each learning sub tasks, which, in turn, alleviates the problem of under-sampling of minority categories. Therefore, we transform hierarchical labels to a hierarchical label distribution to represent this correlation. After that, a discriminative feature subset is selected recursively, by the common features and label-specific feature constraints, to ensure that downstream classification tasks can achieve the best performance. Experiments and comparisons, using seven well-established feature selection algorithms on six real data sets with different degrees of imbalance, demonstrate the superiority of the proposed method.
引用
下载
收藏
页码:5964 / 5976
页数:13
相关论文
共 50 条
  • [1] Label distribution feature selection based on hierarchical structure and neighborhood granularity
    Lu, Xiwen
    Qian, Wenbin
    Dai, Shiming
    Huang, Jintao
    INFORMATION FUSION, 2024, 112
  • [2] Feature selection for label distribution learning via feature similarity and label correlation
    Qian, Wenbin
    Xiong, Yinsong
    Yang, Jun
    Shu, Wenhao
    INFORMATION SCIENCES, 2022, 582 : 38 - 59
  • [3] Mutual information-based label distribution feature selection for multi-label learning
    Qian, Wenbin
    Huang, Jintao
    Wang, Yinglong
    Shu, Wenhao
    KNOWLEDGE-BASED SYSTEMS, 2020, 195
  • [4] Multi-label feature selection based on label distribution and feature complementarity
    Qian, Wenbin
    Long, Xuandong
    Wang, Yinglong
    Xie, Yonghong
    APPLIED SOFT COMPUTING, 2020, 90
  • [5] Feature selection for label distribution learning under feature weight view
    Lin, Shidong
    Wang, Chenxi
    Mao, Yu
    Lin, Yaojin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 1827 - 1840
  • [6] Feature selection for label distribution learning under feature weight view
    Shidong Lin
    Chenxi Wang
    Yu Mao
    Yaojin Lin
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 1827 - 1840
  • [7] Feature selection for label distribution learning based on neighborhood fuzzy rough sets
    Deng, Zhixuan
    Li, Tianrui
    Zhang, Pengfei
    Liu, Keyu
    Yuan, Zhong
    Deng, Dayong
    Applied Soft Computing, 2025, 169
  • [8] Hierarchical Classification Based on Label Distribution Learning
    Xu, Changdong
    Geng, Xin
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5533 - 5540
  • [9] Feature selection for label distribution learning based on the statistical distribution of data and fuzzy mutual information
    You, Hengyan
    Wang, Pei
    Li, Zhaowen
    INFORMATION SCIENCES, 2024, 679
  • [10] Partial label feature selection based on noisy manifold and label distribution
    Qian, Wenbin
    Liu, Jiale
    Yang, Wenji
    Huang, Jintao
    Ding, Weiping
    PATTERN RECOGNITION, 2024, 156