RFCRYS: Sequence-based protein crystallization propensity prediction by means of random forest

被引:21
|
作者
Jahandideh, Samad [1 ]
Mahdavi, Abbas [2 ]
机构
[1] Univ Alabama Birmingham, Dept Biostat, Sect Stat Genet, Birmingham, AL 35294 USA
[2] Vali E Asr Univ Rafsanjan, Dept Stat, Fac Math Sci, Rafsanjan, Iran
关键词
Random forest algorithm; X-ray crystallography; Protein structure; AMINO-ACID-COMPOSITION; SUBCELLULAR-LOCALIZATION; STRUCTURAL GENOMICS; CLASSIFIER; ALGORITHM; MECHANISM; CHANNEL; SINGLE; IMPACT; SCALE;
D O I
10.1016/j.jtbi.2012.04.028
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Production of high-quality diffracting crystals is a critical step in determining the 3D structure of a protein by X-ray crystallography. Only 2%-10% of crystallization projects result in high-resolution protein structures. Previously, several computational methods for prediction of protein crystallizability were developed. In this work, we introduce RFCRYS, a Random Forest: based method to predict crystallizability of proteins. RFCRYS utilizes mono-, di-, and tri-peptides amino acid compositions, frequencies of amino acids in different physicochemical groups, isoelectric point, molecular weight, and length of protein sequences, from the primary sequences to predict crystallizabillity by using two different databases. RFCRYS was compared with previous methods and the results obtained show that our proposed method using this set of features outperforms existing predictors with higher accuracy. MCC, and Specificity. Especially, our method is characterized by high Specificity of 0.95, which means RFCRYS rarely mispredicts a protein chain to be crystallizable which consequently would be useful for saving time and resources. In conclusion RFCRYS provides accurate crystallizability prediction for a protein chain that can be applied to support crystallization projects getting higher success rate towards obtaining diffraction-quality crystals. Published by Elsevier Ltd.
引用
收藏
页码:115 / 119
页数:5
相关论文
共 50 条
  • [1] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Qizhi Zhu
    Lihua Wang
    Ruyu Dai
    Wei Zhang
    Wending Tang
    Yannan Bin
    Zeliang Wang
    Junfeng Xia
    [J]. Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 : 693 - 702
  • [2] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Zhu, Qizhi
    Wang, Lihua
    Dai, Ruyu
    Zhang, Wei
    Tang, Wending
    Bin, Yannan
    Wang, Zeliang
    Xia, Junfeng
    [J]. INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2021, 13 (04) : 693 - 702
  • [3] Sequence-based prediction of protein crystallization, purification and production propensity
    Mizianty, Marcin J.
    Kurgan, Lukasz
    [J]. BIOINFORMATICS, 2011, 27 (13) : I24 - I33
  • [4] CRYSTALP2: sequence-based protein crystallization propensity prediction
    Kurgan, Lukasz
    Razib, Ali A.
    Aghakhani, Sara
    Dick, Scott
    Mizianty, Marcin
    Jahandideh, Samad
    [J]. BMC STRUCTURAL BIOLOGY, 2009, 9
  • [5] Accurate multistage prediction of protein crystallization propensity using deep-cascade forest with sequence-based features
    Zhu, Yi-Heng
    Hu, Jun
    Ge, Fang
    Li, Fuyi
    Song, Jiangning
    Zhang, Yang
    Yu, Dong-Jun
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [6] Sequence-Based Prediction of Protein-Protein Interactions by Means of Rotation Forest and Autocorrelation Descriptor
    Xia, Jun-Feng
    Han, Kyungsook
    Huang, De-Shuang
    [J]. PROTEIN AND PEPTIDE LETTERS, 2010, 17 (01): : 137 - 145
  • [7] CRYSpred: Accurate Sequence-Based Protein Crystallization Propensity Prediction Using Sequence-Derived Structural Characteristics
    Mizianty, Marcin J.
    Kurgan, Lukasz A.
    [J]. PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01): : 40 - 49
  • [8] DeepCrystal: a deep learning framework for sequence-based protein crystallization prediction
    Elbasir, Abdurrahman
    Moovarkumudalvan, Balasubramanian
    Kunji, Khalid
    Kolatkar, Prasanna R.
    Mall, Raghvendra
    Bensmail, Halima
    [J]. BIOINFORMATICS, 2019, 35 (13) : 2216 - 2225
  • [9] DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction
    Elbasir, Abdurrahman
    Moovarkumudalvan, Balasubramanian
    Kunji, Khalid
    Kolatkar, Prasanna R.
    Bensmail, Halima
    Mall, Raghvendra
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2747 - 2749
  • [10] Sequence-based prediction of protein-protein interactions by means of codon usage
    Najafabadi, Hamed Shateri
    Salavati, Reza
    [J]. GENOME BIOLOGY, 2008, 9 (05)