Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design

被引:3
|
作者
Mamitsuka, H [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Uji 6110011, Japan
关键词
D O I
10.1109/BIBE.2003.1188959
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Discovering a new drug is one of the most important goals in not only the pharmaceutical field but also a variety of fields including molecular biology, chemistry and medical science. The importance of computationally understanding the relationships between a given chemical compound and its drug activity has been pronounced. In the data set regarding drug activity of chemical compounds, each row corresponds to a chemical compound, and columns are the descriptors of the compound and a label indicating drug activity of the compound. Recently, the size of the descriptors has become larger to obtain more detailed information from a given set of compounds. Actually, the number of columns (attributes or features) of some drug data sets reaches hundreds of thousands or a million. The purpose of this paper is to empirically evaluate the performance of ensemble feature subset selection strategies by applying them to such a high-dimensional data set actually used in the process of drug design. We examined the performance of three ensemble methods, including a query learning based method, comparing with that of one of the latest feature subset selection methods. The evaluation was performed on a data set which contains approximately 140, 000 features. Our results show that the query learning based methodology outperformed the other three methods, in terms of the final prediction accuracy and time efficiency. We have also examined the effect of noise in the data and found that the advantage of the method becomes more pronounced for larger noise levels.
引用
收藏
页码:253 / 257
页数:5
相关论文
共 50 条
  • [1] A GA-BASED FEATURE SELECTION AND ENSEMBLE LEARNING FOR HIGH-DIMENSIONAL DATASETS
    Xia, Pei-Yong
    Ding, Xiang-Qian
    Jiang, Bai-Ning
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 7 - +
  • [2] High-Dimensional Ensemble Learning Classification: An Ensemble Learning Classification Algorithm Based on High-Dimensional Feature Space Reconstruction
    Zhao, Miao
    Ye, Ning
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [3] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
    Mamitsuka, H
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (01) : 91 - 108
  • [4] Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
    Hiroshi Mamitsuka
    [J]. Knowledge and Information Systems, 2006, 9 : 91 - 108
  • [5] A Feature Subset Selection Method Based On High-Dimensional Mutual Information
    Zheng, Yun
    Kwoh, Chee Keong
    [J]. ENTROPY, 2011, 13 (04) : 860 - 901
  • [6] Feature Subset Selection for High-Dimensional, Low Sampling Size Data Classification Using Ensemble Feature Selection With a Wrapper-Based Search
    Mandal, Ashis Kumar
    Nadim, MD.
    Saha, Hasi
    Sultana, Tangina
    Hossain, Md. Delowar
    Huh, Eui-Nam
    [J]. IEEE ACCESS, 2024, 12 : 62341 - 62357
  • [7] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    [J]. Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [8] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [9] Automated online feature selection and learning from high-dimensional streaming data using an ensemble of Kohonen neurons
    Roy, Asim
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [10] Efficient Learning and Feature Selection in High-Dimensional Regression
    Ting, Jo-Anne
    D'Souza, Aaron
    Vijayakumar, Sethu
    Schaal, Stefan
    [J]. NEURAL COMPUTATION, 2010, 22 (04) : 831 - 886