RFCRYS: Sequence-based protein crystallization propensity prediction by means of random forest

被引:21
|
作者
Jahandideh, Samad [1 ]
Mahdavi, Abbas [2 ]
机构
[1] Univ Alabama Birmingham, Dept Biostat, Sect Stat Genet, Birmingham, AL 35294 USA
[2] Vali E Asr Univ Rafsanjan, Dept Stat, Fac Math Sci, Rafsanjan, Iran
关键词
Random forest algorithm; X-ray crystallography; Protein structure; AMINO-ACID-COMPOSITION; SUBCELLULAR-LOCALIZATION; STRUCTURAL GENOMICS; CLASSIFIER; ALGORITHM; MECHANISM; CHANNEL; SINGLE; IMPACT; SCALE;
D O I
10.1016/j.jtbi.2012.04.028
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Production of high-quality diffracting crystals is a critical step in determining the 3D structure of a protein by X-ray crystallography. Only 2%-10% of crystallization projects result in high-resolution protein structures. Previously, several computational methods for prediction of protein crystallizability were developed. In this work, we introduce RFCRYS, a Random Forest: based method to predict crystallizability of proteins. RFCRYS utilizes mono-, di-, and tri-peptides amino acid compositions, frequencies of amino acids in different physicochemical groups, isoelectric point, molecular weight, and length of protein sequences, from the primary sequences to predict crystallizabillity by using two different databases. RFCRYS was compared with previous methods and the results obtained show that our proposed method using this set of features outperforms existing predictors with higher accuracy. MCC, and Specificity. Especially, our method is characterized by high Specificity of 0.95, which means RFCRYS rarely mispredicts a protein chain to be crystallizable which consequently would be useful for saving time and resources. In conclusion RFCRYS provides accurate crystallizability prediction for a protein chain that can be applied to support crystallization projects getting higher success rate towards obtaining diffraction-quality crystals. Published by Elsevier Ltd.
引用
收藏
页码:115 / 119
页数:5
相关论文
共 50 条
  • [21] Recent developments of sequence-based prediction of protein–protein interactions
    Yoichi Murakami
    Kenji Mizuguchi
    Biophysical Reviews, 2022, 14 : 1393 - 1411
  • [22] SeqTMPPI: Sequence-Based Transmembrane Protein Interaction Prediction
    Wang, Han
    Jiang, Jiuhong
    Chen, Qiufen
    Zhang, Chunhua
    Lu, Chang
    Ma, Zhiqiang
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 96 - 99
  • [23] Recent advances in sequence-based protein structure prediction
    Dukka, B. K. C.
    BRIEFINGS IN BIOINFORMATICS, 2017, 18 (06) : 1021 - 1032
  • [24] SOLpro: accurate sequence-based prediction of protein solubility
    Magnan, Christophe N.
    Randall, Arlo
    Baldi, Pierre
    BIOINFORMATICS, 2009, 25 (17) : 2200 - 2207
  • [25] Sequence-based prediction of protein binding mode landscapes
    Horvath, Attila
    Miskei, Marton
    Ambrusl, Viktor
    Vendruscolo, Michele
    Fuxreiter, Monika
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (05)
  • [26] Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation With Rotation Forest
    Pan, Jie
    Li, Li-Ping
    Yu, Chang-Qing
    You, Zhu-Hong
    Guan, Yong-Jian
    Ren, Zhong-Hao
    EVOLUTIONARY BIOINFORMATICS, 2021, 17
  • [27] AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest
    Bhadra, Pratiti
    Yan, Jielu
    Li, Jinyan
    Fong, Simon
    Siu, Shirley W. I.
    SCIENTIFIC REPORTS, 2018, 8
  • [28] Recent developments of sequence-based prediction of protein-protein interactions
    Murakami, Yoichi
    Mizuguchi, Kenji
    BIOPHYSICAL REVIEWS, 2022, 14 (06) : 1393 - 1411
  • [29] AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest
    Pratiti Bhadra
    Jielu Yan
    Jinyan Li
    Simon Fong
    Shirley W. I. Siu
    Scientific Reports, 8
  • [30] Meta prediction of protein crystallization propensity
    Mizianty, Marcin J.
    Kurgan, Lukasz
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2009, 390 (01) : 10 - 15