Machine learning with automatic feature selection for multi-class protein fold classification

被引:0
|
作者
Huang, CD [1 ]
Liang, SF
Lin, CT
Wu, RC
机构
[1] Hsiuping Inst Technol, Dept Elect Engn, Taichung 412, Taiwan
[2] Natl Chiao Tung Univ, Dept Elect & Control Engn, Hsinchu 300, Taiwan
关键词
machine learning; hierarchical architecture; feature selection; gate; neural network; protein fold; bioinformatics;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In machine learning, both the properly used networks and the selected features are important factors which should be considered carefully. These two factors will influence the result, whether for better or worse. In bioinformatics, the amount of features may be very large to make machine learning possible. In this study we introduce the idea of feature selection in the problem of bioinformatics. We use neural networks to complete our task where each input node is associated with a gate. At the beginning of the training, all gates are almost closed, and, at this time, no features are allowed to enter the network. During the training phase, gates are either opened or closed, depending oil the requirements. After the selection training phase has completed, gates corresponding to the helpful features are completely opened while gates Corresponding to the useless features are closed more tightly. Some gates may be partially open, depending oil the importance of the corresponding features. So, the network can not only select features in an online manner during learning, but it also does some feature extraction. We combine feature selection with our novel hierarchical machine learning architecture and apply it to multi-class protein fold classification. At the first level the network classifies the data into four major folds: all alpha, all beta, alpha + beta and alpha/beta. In the next level, we have another set of networks which further classifies the data into twenty-seven folds. This approach helps achieve the following. The gating network is found to reduce the number of features drastically. It is interesting to observe that, for the first level using just 50 features selected by the gating network, we can get a test accuracy comparable to that using 125 features in neural classifiers. The process also helps us get a better insight into the folding process. For example, tracking the evolution of different gates, we call find which characteristics (features) of the data are more important for the folding process. Eventually, it reduces the computation time. The use of the hierarchical architecture helps LIS get a better performance also.
引用
收藏
页码:711 / 720
页数:10
相关论文
共 50 条
  • [21] DCA based algorithms for feature selection in multi-class support vector machine
    Hoai An Le Thi
    Manh Cuong Nguyen
    [J]. ANNALS OF OPERATIONS RESEARCH, 2017, 249 (1-2) : 273 - 300
  • [22] Feature selection with multi-class logistic regression
    Wang, Jingyu
    Wang, Hongmei
    Nie, Feiping
    Li, Xuelong
    [J]. NEUROCOMPUTING, 2023, 543
  • [23] Scalable Feature Selection for Multi-class Problems
    Chidlovskii, Boris
    Lecerf, Loic
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 227 - 240
  • [24] Multi-Class Classification of Agricultural Data Based on Random Forest and Feature Selection
    Shi, Lei
    Qin, Yaqian
    Zhang, Juanjuan
    Wang, Yan
    Qiao, Hongbo
    Si, Haiping
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2022, 15 (01)
  • [25] Feature Selection for Multi-class Classification using Support Vector Data Description
    Jeong, Daun
    Kang, Dongyeop
    Won, Sangchul
    [J]. IECON 2010 - 36TH ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2010,
  • [26] Sparse Discriminative Feature Selection for Multi-class Alzheimer's Disease Classification
    Zhu, Xiaofeng
    Suk, Heung-Il
    Shen, Dinggang
    [J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2014), 2014, 8679 : 157 - 164
  • [27] SEQUENTIAL HETEROGENEOUS FEATURE SELECTION FOR MULTI-CLASS CLASSIFICATION: APPLICATION IN GOVERNMENT 2.0
    Nazar, Imara
    Liyanage, Yasitha Warahena
    Zois, Daphney-Stavroula
    Chelmis, Charalampos
    [J]. PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [28] Feature selection based on FDA and F-score for multi-class classification
    Song, QingJun
    Jiang, HaiYan
    Liu, Jing
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 22 - 27
  • [29] Multi-class Weather Classification: Comparative Analysis of Machine Learning Algorithms
    Mishra, Amartya
    Roy, Ganpati Kumar
    Singla, Kanika
    [J]. ADVANCES IN DATA AND INFORMATION SCIENCES, 2022, 318 : 307 - 316
  • [30] A Multi-class Classification Approach for Weather Forecasting with Machine Learning Techniques
    Dritsas, Elias
    Trigka, Maria
    Mylonas, Phivos
    [J]. 2022 17TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2022), 2022, : 81 - 85