Protein fold identification using machine learning methods on contact maps

被引:0
|
作者
Vani, K. Suvarna [1 ]
Kumar, K. Praveen [1 ]
机构
[1] VR Siddhartha Engn Coll, Dept Comp Sci & Engn, Vijayawada, Andhra Pradesh, India
关键词
Contact map; SMOTE; Decision Tree; Machine Learning; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proteins can be classified among the four structural classes of All-Alpha, All-Beta, Alpha+Beta and Alpha/Beta which are further subdivided into 27 folds. Protein fold classification problem is cited in the literature as a challenging unbalanced classification problem with the accuracy results being as low as 51.1% on the bench mark data set of Ding et al. and highest accuracy at 60.5% using 2500 features. We represent the proteins as 11-length feature vectors and adopt Synthetic Minority Over-sampling Technique (SMOTE) based boosting approach to balance the data to address the 27-way fold classification problem. We build C4.5 decision tree classifier in combination with SMOTE boosting algorithm using the novel contact map features and show that the prediction accuracy is enhanced to 64%. An additional advantage of our approach is the reduced dimensionality of the feature vector which is 11 whereas literature uses more than 100 features on average. Further, we propose an algorithm ExtractPatterns that extracts (non) rectangular 2D regions of contacts from the off-diagonal region in linear time. A feature vector of length 11 is formed using this study constituting diagonal and off-diagonal statistical features.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Mining of protein contact maps for protein fold prediction
    Bhavani, Durga
    Suvarnavani, K.
    Sinha, Somdatta
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (04) : 362 - 368
  • [2] Protein fold recognition and dynamics in the space of contact maps
    Mirny, L
    Domany, E
    [J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1996, 26 (04): : 391 - 410
  • [3] Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition
    Wei, Leyi
    Zou, Quan
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2016, 17 (12)
  • [4] Protein fold families prediction based on graph representations and machine learning methods
    Areiza-Laverde, H. J.
    Mercado-Diaz, L. R.
    Castro-Ospina, A. E.
    Jaramillo-Garzon, J. A.
    [J]. 2016 XXI SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND ARTIFICIAL VISION (STSIVA), 2016,
  • [5] Recent Trends in Machine Learning-based Protein Fold Recognition Methods
    Mehta, Apurva
    Mazumdar, Himanshu
    [J]. BIOINTERFACE RESEARCH IN APPLIED CHEMISTRY, 2021, 11 (04): : 11233 - 11243
  • [6] Frequent substructures and fold classification from protein contact maps
    Vani, Suvarna K.
    Swaroopa, M. Om
    Sravani, T. D.
    Kumar, K. Praveen
    [J]. 2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014,
  • [7] Generating clusters for turbidite probability maps using machine learning methods
    Pinheiro, Eduardo Sarruf
    Caseri, Angelica N.
    Pesco, Sinesio
    [J]. PETROLEUM SCIENCE AND TECHNOLOGY, 2024, 42 (15) : 1884 - 1897
  • [8] Protein folding using contact maps
    Vendruscolo, M
    Domany, E
    [J]. VITAMINS AND HORMONES - ADVANCES IN RESEARCH AND APPLICATIONS, VOL 58, 2000, 58 : 171 - 212
  • [9] Chemical identification of metamorphic protoliths using machine learning methods
    Hasterok, D.
    Gard, M.
    Bishop, C. M. B.
    Kelsey, D.
    [J]. COMPUTERS & GEOSCIENCES, 2019, 132 : 56 - 68
  • [10] Protein folding using contact maps and contact vectors
    Vendruscolo, M
    [J]. ARTIFICIAL INTELLIGENCE AND HEURISTIC METHODS IN BIOINFORMATICS, 2003, 183 : 75 - 82