Protein encoder: An autoencoder-based ensemble feature selection scheme to predict protein secondary structure

被引:8
|
作者
Uzma [1 ]
Manzoor, Usama [1 ,2 ]
Halim, Zahid [1 ]
机构
[1] Ghulam Ishaq Khan Inst Engn Sci & Technol, Fac Comp Sci & Engn, Machine Intelligence Res Grp MInG, Topi 23460, Pakistan
[2] Namal Univ, Dept Comp Sci, Mianwali, Pakistan
关键词
Protein secondary structure prediction; Ensemble methods; Autoencoder; Feature extraction; Amino acids; RANDOM FOREST; SEQUENCE;
D O I
10.1016/j.eswa.2022.119081
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proteins play a vital role in the human body as they perform important metabolic tasks. Experimental identi-fication of protein structure is expensive and time consuming. The prediction of protein secondary structure is significant to identify the protein tertiary structure and its folds. The feature subset selection from high dimensional protein primary sequence is a key to improve the accuracy of Protein Secondary Structure Prediction (PSSP). Therefore, it is essential to select the relevant features from high dimensional data to predict the protein secondary structure. This work presents a novel method for the PSSP problem based on a two-phase feature selection technique. The first stage utilizes an unsupervised autoencoder for feature extractions. Whereas, the second stage is an ensemble of three feature selection methods, namely, generic univariate select, recursive feature elimination, and Pearson's correlation. This phase combines multiple feature subsets using mutual in-formation to select the optimum feature subset. For classification, different resultant subset features are used. These include random forest, decision tree, and multilayer perceptron. Two sets of experiments are performed on five datasets for the assessment of proposed work. The proposed solution is compared with three state-of-the-art methods based on Q3 accuracy, Q8 accuracy, and segment overlap score. Obtained results show that the pro-posed framework performs better in the majority of the cases than the past contributions. The proposed framework achieves Q8 accuracies of 82%, 80%, 79%, 73%, and 74% and Q3 accuracies of 90%, 90%, 92%, 79%, and 74% on CB6133, CB6133-filtered, CB513, CASP10, and CASP11 datasets, respectively.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Evaluating Autoencoder-Based Featurization and Supervised Learning for Protein Decoy Selection
    Alam, Fardina Fathmiul
    Rahman, Taseef
    Shehu, Amarda
    [J]. MOLECULES, 2020, 25 (05):
  • [2] Graph Regularized Autoencoder-Based Unsupervised Feature Selection
    Feng, Siwei
    Duarte, Marco F.
    [J]. 2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 55 - 59
  • [3] Graph autoencoder-based unsupervised feature selection with broad and local data structure preservation
    Feng, Siwei
    Duarte, Marco F.
    [J]. NEUROCOMPUTING, 2018, 312 : 310 - 323
  • [4] The prediction of protein secondary structure based on auto encoder
    Zhang Shuai-yan
    Liu Yi-hui
    Cheng Jin-yong
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2065 - 2069
  • [5] Predicting protein secondary structure by an ensemble through feature-based accuracy estimation
    Krieger, Spencer
    Kececioglu, John
    [J]. ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [6] Ensemble Learning-Based Feature Selection for Phage Protein Prediction
    Liu, Songbo
    Cui, Chengmin
    Chen, Huipeng
    Liu, Tong
    [J]. FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [7] A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction
    Dhanuka, Richa
    Tripathi, Anushree
    Singh, Jyoti P.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (10) : 4957 - 4965
  • [8] Prediction of Protein Secondary Structure Using Feature Selection and Analysis Approach
    Yonge Feng
    Hao Lin
    Liaofu Luo
    [J]. Acta Biotheoretica, 2014, 62 : 1 - 14
  • [9] Prediction of Protein Secondary Structure Using Feature Selection and Analysis Approach
    Feng, Yonge
    Lin, Hao
    Luo, Liaofu
    [J]. ACTA BIOTHEORETICA, 2014, 62 (01) : 1 - 14
  • [10] Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction
    Kazm, Ammar
    Ali, Aida
    Hashim, Haslina
    [J]. ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2024, 14 (02) : 13124 - 13132