Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets

被引：0

作者：

Hiroshi Mamitsuka

机构：

[1] Kyoto University,Institute for Chemical Research

来源：

Knowledge and Information Systems | 2006年 / 9卷

关键词：

Query learning; Feature-subset selection; High-dimensional data set; Uncertainty sampling; Drug design;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We propose a new data-mining method that is effective for learning from extremely high-dimensional data sets. Our proposed method selects a subset of features from a high-dimensional data set by a process of iterative refinement. Our selection of a feature-subset has two steps. The first step selects a subset of instances, to which predictions by hypotheses previously obtained are most unreliable, from the data set. The second step selects a subset of features whose values in the selected instances vary the most from those in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of four other methods, including one of the latest feature-subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods in terms of prediction accuracy, precision at a certain recall value, and computation time to reach a certain prediction accuracy. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels. Extended abstracts of parts of the work presented in this paper have appeared in Mamitsuka [14] and Mamitsuka [15].

引用

页码：91 / 108

页数：17

共 50 条

[1] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
Mamitsuka, H
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 91 - 108
[2] Local-Learning-Based Feature Selection for High-Dimensional Data Analysis
Sun, Yijun
Todorovic, Sinisa
Goodison, Steve
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) : 1610 - 1626
[3] Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey
Liu, De-Rong
Li, Hong-Liang
Wang, Ding
[J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2015, 12 (03) : 229 - 242
[4] Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey
De-Rong Liu
Hong-Liang Li
Ding Wang
[J]. International Journal of Automation and Computing, 2015, 12 (03) : 229 - 242
[5] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
Song, Qinbao
Ni, Jingjie
Wang, Guangtao
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
[6] Feature Subset Selection Approach Based on Fuzzy Rough Set for High-dimensional Data
Guo, Changyou
Zheng, Xuefeng
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 72 - 75
[7] Efficient Learning and Feature Selection in High-Dimensional Regression
Ting, Jo-Anne
D'Souza, Aaron
Vijayakumar, Sethu
Schaal, Stefan
[J]. NEURAL COMPUTATION, 2010, 22 (04) : 831 - 886
[8] Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design
Mamitsuka, H
[J]. THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 253 - 257
[9] WrapperRL: Reinforcement Learning Agent for Feature Selection in High-Dimensional Industrial Data
Shaer, Ibrahim
Shami, Abdallah
[J]. IEEE ACCESS, 2024, 12 : 128338 - 128348
[10] A Feature Subset Selection Method Based On High-Dimensional Mutual Information
Zheng, Yun
Kwoh, Chee Keong
[J]. ENTROPY, 2011, 13 (04) : 860 - 901

← 1 2 3 4 5 →