Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets

被引:0
|
作者
Hiroshi Mamitsuka
机构
[1] Kyoto University,Institute for Chemical Research
来源
关键词
Query learning; Feature-subset selection; High-dimensional data set; Uncertainty sampling; Drug design;
D O I
暂无
中图分类号
学科分类号
摘要
We propose a new data-mining method that is effective for learning from extremely high-dimensional data sets. Our proposed method selects a subset of features from a high-dimensional data set by a process of iterative refinement. Our selection of a feature-subset has two steps. The first step selects a subset of instances, to which predictions by hypotheses previously obtained are most unreliable, from the data set. The second step selects a subset of features whose values in the selected instances vary the most from those in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of four other methods, including one of the latest feature-subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods in terms of prediction accuracy, precision at a certain recall value, and computation time to reach a certain prediction accuracy. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels. Extended abstracts of parts of the work presented in this paper have appeared in Mamitsuka [14] and Mamitsuka [15].
引用
收藏
页码:91 / 108
页数:17
相关论文
共 50 条
  • [41] Identification of abnormal conditions in high-dimensional chemical process based on feature selection and deep learning
    Tian, Wende
    Liu, Zijian
    Li, Lening
    Zhang, Shifa
    Li, Chuankun
    [J]. CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2020, 28 (07) : 1875 - 1883
  • [42] A Hybrid Ensemble Feature Selection-Based Learning Model for COPD Prediction on High-Dimensional Feature Space
    Banda, Srinivas Raja Banda
    Babu, Tummala Ranga
    [J]. DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 663 - 675
  • [43] Feature selection from high dimensional data based on iterative qualitative mutual information
    Nagpal, Arpita
    Singh, Vijendra
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (06) : 5845 - 5856
  • [44] Advancing gene feature selection: Comprehensive learning modified hunger games search for high-dimensional data
    Huang, Yueyue
    Wu, Minmin
    Li, Ding
    Chen, Zhiqiang
    Yu, Xueshu
    Gao, Yifan
    Lai, Xiaojuan
    Ye, Lianmin
    Quan, Shichao
    Lu, Yingru
    Heidari, Ali Asghar
    Chen, Huiling
    Pan, Jingye
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 87
  • [45] On feature selection for supervised learning problems involving high-dimensional analytical information
    Zuvela, P.
    Liu, J. Jay
    [J]. RSC ADVANCES, 2016, 6 (86) : 82801 - 82809
  • [46] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    [J]. THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [47] Learning from label proportions on high-dimensional data
    Shi, Yong
    Liu, Jiabin
    Qi, Zhiquan
    Wang, Bo
    [J]. NEURAL NETWORKS, 2018, 103 : 9 - 18
  • [48] Learning high-dimensional multimedia data
    Xiaofeng Zhu
    Zhi Jin
    Rongrong Ji
    [J]. Multimedia Systems, 2017, 23 : 281 - 283
  • [49] Learning to visualise high-dimensional data
    Ahmad, K
    Vrusias, B
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2004, : 507 - 512
  • [50] Learning high-dimensional multimedia data
    Zhu, Xiaofeng
    Jin, Zhi
    Ji, Rongrong
    [J]. MULTIMEDIA SYSTEMS, 2017, 23 (03) : 281 - 283