Rough set-based feature selection for weakly labeled data

被引:29
|
作者
Campagner, Andrea [1 ]
Ciucci, Davide [1 ]
Huellermeier, Eyke [2 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Viale Sarca 336, I-20126 Milan, Italy
[2] Univ Munich LMU, Inst Informat, Munich, Germany
关键词
Superset Learning; Rough Sets; Feature Selection; Evidence Theory; Entropy; DEMPSTER-SHAFER THEORY; TOTAL UNCERTAINTY; BELIEF FUNCTIONS; CLASSIFICATION; ENTROPY; INFORMATION; RULE;
D O I
10.1016/j.ijar.2021.06.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised learning is an important branch of machine learning (ML), which requires a complete annotation (labeling) of the involved training data. This assumption is relaxed in the settings of weakly supervised learning, where labels are allowed to be imprecise or partial. In this article, we study the setting of superset learning, in which instances are assumed to be labeled with a set of possible annotations containing the correct one. We tackle the problem of learning from such data in the context of rough set theory (RST). More specifically, we consider the problem of RST-based feature reduction as a suitable means for data disambiguation, i.e., for the purpose of figuring out the most plausible precise instantiation of the imprecise training data. To this end, we define appropriate generalizations of decision tables and reducts, using tools from generalized information theory and belief function theory. Moreover, we analyze the computational complexity and theoretical properties of the associated computational problems. Finally, we present results of a series of experiments, in which we analyze the proposed concepts empirically and compare our methods with a state-of-the-art dimensionality reduction algorithm, reporting a statistically significant improvement in predictive accuracy. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:150 / 167
页数:18
相关论文
共 50 条
  • [41] A rough set-based fuzzy clustering
    Zhao, YQ
    Zhou, XZ
    Tang, GZ
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 401 - 409
  • [42] Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification
    Sun, L.
    Xu, J. -C.
    Wang, W.
    Yin, Y.
    [J]. GENETICS AND MOLECULAR RESEARCH, 2016, 15 (03):
  • [43] A rough set approach to feature selection based on power set tree
    Chen, Yumin
    Miao, Duoqian
    Wang, Ruizhi
    Wu, Keshou
    [J]. KNOWLEDGE-BASED SYSTEMS, 2011, 24 (02) : 275 - 281
  • [44] Rough set model based feature selection for mixed-type data with feature space decomposition
    Kim, Kyung-Jun
    Jun, Chi-Hyuck
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 103 : 196 - 205
  • [45] New Online Streaming Feature Selection Based on Neighborhood Rough Set for Medical Data
    Lei, Dingfei
    Liang, Pei
    Hu, Junhua
    Yuan, Yuan
    [J]. SYMMETRY-BASEL, 2020, 12 (10): : 1 - 31
  • [46] Attribute Selection for Partially Labeled Categorical Data By Rough Set Approach
    Dai, Jianhua
    Hu, Qinghua
    Zhang, Jinghong
    Hu, Hu
    Zheng, Nenggan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) : 2460 - 2471
  • [47] On the Definability of a Set and Rough Set-Based Rule Generation
    Sakai, Hiroshi
    Wu, Mao
    Yamaguchi, Naoto
    [J]. 2014 IIAI 3RD INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2014), 2014, : 122 - 125
  • [48] A rough set-based approach to handling uncertainty in geographic data classification
    Jankowski, Piotr
    [J]. GEOGRAPHIC UNCERTAINTY IN ENVIRONMENTAL SECURITY, 2007, : 75 - 87
  • [49] A relational perspective of attribute reduction in rough set-based data analysis
    Fan, Tuan-Fang
    Liau, Churn-Jung
    Liu, Duen-Ren
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 213 (01) : 270 - 278
  • [50] A Rough Set-Based Data Analysis in Power System for Fault Diagnosis
    Ren, Dajiang
    [J]. INFORMATION COMPUTING AND APPLICATIONS, PT II, 2011, 244 : 265 - 272