Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets

被引:0
|
作者
Hiroshi Mamitsuka
机构
[1] Kyoto University,Institute for Chemical Research
来源
关键词
Query learning; Feature-subset selection; High-dimensional data set; Uncertainty sampling; Drug design;
D O I
暂无
中图分类号
学科分类号
摘要
We propose a new data-mining method that is effective for learning from extremely high-dimensional data sets. Our proposed method selects a subset of features from a high-dimensional data set by a process of iterative refinement. Our selection of a feature-subset has two steps. The first step selects a subset of instances, to which predictions by hypotheses previously obtained are most unreliable, from the data set. The second step selects a subset of features whose values in the selected instances vary the most from those in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of four other methods, including one of the latest feature-subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods in terms of prediction accuracy, precision at a certain recall value, and computation time to reach a certain prediction accuracy. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels. Extended abstracts of parts of the work presented in this paper have appeared in Mamitsuka [14] and Mamitsuka [15].
引用
收藏
页码:91 / 108
页数:17
相关论文
共 50 条
  • [31] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [32] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [33] Feature selection for high-dimensional temporal data
    Michail Tsagris
    Vincenzo Lagani
    Ioannis Tsamardinos
    [J]. BMC Bioinformatics, 19
  • [34] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [35] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    [J]. ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [36] Survey on Feature Subset Selection for High Dimensional Data
    Shahana, A. H.
    Preeja, V
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [37] Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search
    Mandal, Ashis Kumar
    Nadim, MD.
    Saha, Hasi
    Sultana, Tangina
    Hossain, Md. Delowar
    Huh, Eui-Nam
    [J]. IEEE ACCESS, 2024, 12 : 62341 - 62357
  • [38] A Novel Feature Selection-Based Sequential Ensemble Learning Method for Class Noise Detection in High-Dimensional Data
    Chen, Kai
    Guan, Donghai
    Yuan, Weiwei
    Li, Bohan
    Khattak, Asad Masood
    Alfandi, Omar
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 55 - 65
  • [39] Identification of abnormal conditions in high-dimensional chemical process based on feature selection and deep learning
    Wende Tian
    Zijian Liu
    Lening Li
    Shifa Zhang
    Chuankun Li
    [J]. Chinese Journal of Chemical Engineering, 2020, 28 (07) : 1875 - 1883
  • [40] Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data
    Kim, Jongmo
    Kang, Jaewoong
    Sohn, Mye
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 220