Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design

被引:3
|
作者
Mamitsuka, H [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Uji 6110011, Japan
关键词
D O I
10.1109/BIBE.2003.1188959
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound. Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140, 000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.
引用
收藏
页码:253 / 257
页数:5
相关论文
共 50 条
  • [21] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [22] Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains
    Pes, Barbara
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10): : 5951 - 5973
  • [23] Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains
    Barbara Pes
    [J]. Neural Computing and Applications, 2020, 32 : 5951 - 5973
  • [24] Application of high-dimensional feature selection: evaluation for genomic prediction in man
    M. L. Bermingham
    R. Pong-Wong
    A. Spiliopoulou
    C. Hayward
    I. Rudan
    H. Campbell
    A. F. Wright
    J. F. Wilson
    F. Agakov
    P. Navarro
    C. S. Haley
    [J]. Scientific Reports, 5
  • [25] Application of high-dimensional feature selection: evaluation for genomic prediction in man
    Bermingham, M. L.
    Pong-Wong, R.
    Spiliopoulou, A.
    Hayward, C.
    Rudan, I.
    Campbell, H.
    Wright, A. F.
    Wilson, J. F.
    Agakov, F.
    Navarro, P.
    Haley, C. S.
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [26] A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm
    Moradkhani, Mostafa
    Amiri, Ali
    Javaherian, Mohsen
    Safari, Hossein
    [J]. APPLIED SOFT COMPUTING, 2015, 35 : 123 - 135
  • [27] Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Shilu, Smit
    Sheth, Kushal
    Mehul, Ekata
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 203 - 213
  • [28] A Novel Feature Selection-Based Sequential Ensemble Learning Method for Class Noise Detection in High-Dimensional Data
    Chen, Kai
    Guan, Donghai
    Yuan, Weiwei
    Li, Bohan
    Khattak, Asad Masood
    Alfandi, Omar
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 55 - 65
  • [29] Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data
    Kim, Jongmo
    Kang, Jaewoong
    Sohn, Mye
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 220
  • [30] The Hybrid Filter Feature Selection Methods for Improving High-Dimensional Text Categorization
    Le Nguyen Hoai Nam
    Ho Bao Quoc
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2017, 25 (02) : 235 - 265