Distributed Selection of Continuous Features in Multilabel Classification Using Mutual Information

被引:44
|
作者
Gonzalez-Lopez, Jorge [1 ]
Ventura, Sebastian [2 ]
Cano, Alberto [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Med Coll Virginia Campus, Richmond, VA 23284 USA
[2] Univ Cordoba, Dept Comp Sci & Numer Anal, E-14071 Cordoba, Spain
关键词
Feature extraction; Entropy; Redundancy; Learning systems; Mutual information; Computational modeling; Cluster computing; Apache spark; distributed computing; feature selection; multilabel learning; mutual information (MI); LABEL FEATURE-SELECTION; TRANSFORMATION;
D O I
10.1109/TNNLS.2019.2944298
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilabel learning is a challenging task demanding scalable methods for large-scale data. Feature selection has shown to improve multilabel accuracy while defying the curse of dimensionality of high-dimensional scattered data. However, the increasing complexity of multilabel feature selection, especially on continuous features, requires new approaches to manage data effectively and efficiently in distributed computing environments. This article proposes a distributed model for mutual information (MI) adaptation on continuous features and multiple labels on Apache Spark. Two approaches are presented based on MI maximization, and minimum redundancy and maximum relevance. The former selects the subset of features that maximize the MI between the features and the labels, whereas the latter additionally minimizes the redundancy between the features. Experiments compare the distributed multilabel feature selection methods on 10 data sets and 12 metrics. Results validated through statistical analysis indicate that our methods outperform reference methods for distributed feature selection for multilabel data, while MIM also reduces the runtime in orders of magnitude.
引用
收藏
页码:2280 / 2293
页数:14
相关论文
共 50 条
  • [1] Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification
    Shi, Enhui
    Sun, Lin
    Xu, Jiucheng
    Zhang, Shiguang
    [J]. IEEE ACCESS, 2020, 8 : 145381 - 145400
  • [2] Mutual information-based feature selection for multilabel classification
    Doquire, Gauthier
    Verleysen, Michel
    [J]. NEUROCOMPUTING, 2013, 122 : 148 - 155
  • [3] Modeling andness in multilabel classification to recognize mutual information
    Tehrani, Ali Fallah
    [J]. PATTERN RECOGNITION LETTERS, 2023, 167 : 98 - 106
  • [4] Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems
    Sun, Lin
    Yin, Tengyu
    Ding, Weiping
    Qian, Yuhua
    Xu, Jiucheng
    [J]. INFORMATION SCIENCES, 2020, 537 : 401 - 424
  • [5] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [6] Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information
    Lin, Yaojin
    Hu, Qinghua
    Liu, Jinghua
    Li, Jinjin
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (06) : 1491 - 1507
  • [7] Multilabel all-relevant feature selection using lower bounds of conditional mutual information
    Teisseyre, Pawel
    Lee, Jaesung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
  • [8] Multilabel Feature Selection Based on Fuzzy Mutual Information and Orthogonal Regression
    Dai, Jianhua
    Liu, Qi
    Chen, Wenxiang
    Zhang, Chucai
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (09) : 5136 - 5148
  • [9] Distributed Information-Theoretic Semisupervised Learning for Multilabel Classification
    Xu, Zhen
    Liu, Ying
    Li, Chunguang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 821 - 835
  • [10] Selection of image features for robot positioning using mutual information
    Wells, G
    Torras, C
    [J]. 1998 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-4, 1998, : 2819 - 2826