Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design

被引:3
|
作者
Mamitsuka, H [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Uji 6110011, Japan
关键词
D O I
10.1109/BIBE.2003.1188959
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound. Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140, 000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.
引用
收藏
页码:253 / 257
页数:5
相关论文
共 50 条
  • [31] Distributed Ensemble Feature Selection Framework for High-Dimensional and High-Skewed Imbalanced Big Dataset
    Soheili, Majid
    Haeri, Maryam Amir Amir
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [32] WrapperRL: Reinforcement Learning Agent for Feature Selection in High-Dimensional Industrial Data
    Shaer, Ibrahim
    Shami, Abdallah
    [J]. IEEE ACCESS, 2024, 12 : 128338 - 128348
  • [33] Local-Learning-Based Feature Selection for High-Dimensional Data Analysis
    Sun, Yijun
    Todorovic, Sinisa
    Goodison, Steve
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) : 1610 - 1626
  • [34] Analysis of Ensemble Feature Selection for Correlated High-Dimensional RNA-Seq Cancer Data
    Polewko-Klim, Aneta
    Rudnicki, Witold R.
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT III, 2020, 12139 : 525 - 538
  • [35] On feature selection for supervised learning problems involving high-dimensional analytical information
    Zuvela, P.
    Liu, J. Jay
    [J]. RSC ADVANCES, 2016, 6 (86) : 82801 - 82809
  • [36] Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data
    Pes, Barbara
    Dessi, Nicoletta
    Angioni, Marta
    [J]. INFORMATION FUSION, 2017, 35 : 132 - 147
  • [37] A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets
    Bermejo, Pablo
    Gamez, Jose A.
    Puerta, Jose M.
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (05) : 701 - 711
  • [38] What can we expect from high-dimensional feature selection
    Sima, Chao
    Dougherty, Edward R.
    [J]. 2006 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2006, : 91 - +
  • [39] Global Feature Subset Selection on High-Dimensional Datasets Using Re-ranking-based EDAs
    Bermejo, Pablo
    de La Ossa, Luis
    Puerta, Jose M.
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7023 : 54 - 63
  • [40] Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking
    Bermejo, Pablo
    de la Ossa, Luis
    Gamez, Jose A.
    Puerta, Jose M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) : 35 - 44