An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions

被引:22
|
作者
Cai, Hao [1 ]
Vernon, Robert M. [1 ]
Forman-Kay, Julie D. [1 ,2 ]
机构
[1] Hosp Sick Children, Mol Med Program, Toronto, ON M5G 0A4, Canada
[2] Univ Toronto, Dept Biochem, Toronto, ON M5S 1A8, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
biomolecular condensates; machine learning; predictor; physical interactions; intrinsically disordered proteins; phase separation; ALPHA-HELICAL STRUCTURE; COACERVATION; GRANULES; MODEL; FORM;
D O I
10.3390/biom12081131
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi-pi contacts, and kinked beta-structures to the score, with electrostatics, cation-pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence's phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Machine-learning techniques for the prediction of protein–protein interactions
    Debasree Sarkar
    Sudipto Saha
    [J]. Journal of Biosciences, 2019, 44
  • [2] More challenges for machine-learning protein interactions
    Hamp, Tobias
    Rost, Burkhard
    [J]. BIOINFORMATICS, 2015, 31 (10) : 1521 - 1525
  • [3] Machine-learning techniques for the prediction of protein-protein interactions
    Sarkar, Debasree
    Saha, Sudipto
    [J]. JOURNAL OF BIOSCIENCES, 2019, 44 (04)
  • [4] A new approach of clustering based machine-learning algorithm
    Al-Omary, Alauddin Yousif
    Jamil, Mohammad Shahid
    [J]. KNOWLEDGE-BASED SYSTEMS, 2006, 19 (04) : 248 - 258
  • [5] A novel machine-learning based approach to predict flares of psoriasis
    Ramelyte, E.
    Djamei, V.
    Maul, T. J.
    Anzengruber, F.
    Navarini, A.
    [J]. EXPERIMENTAL DERMATOLOGY, 2018, 27 (03) : E44 - E45
  • [6] Interpretable Machine-Learning and Big Data Mining to Predict Gas Diffusivity in Metal-Organic Frameworks
    Guo, Shuya
    Huang, Xiaoshan
    Situ, Yizhen
    Huang, Qiuhong
    Guan, Kexin
    Huang, Jiaxin
    Wang, Wei
    Bai, Xiangning
    Liu, Zili
    Wu, Yufang
    Qiao, Zhiwei
    [J]. ADVANCED SCIENCE, 2023, 10 (21)
  • [7] Assessing the druggability of protein-protein interactions by a supervised machine-learning method
    Nobuyoshi Sugaya
    Kazuyoshi Ikeda
    [J]. BMC Bioinformatics, 10
  • [8] Assessing the druggability of protein-protein interactions by a supervised machine-learning method
    Sugaya, Nobuyoshi
    Ikeda, Kazuyoshi
    [J]. BMC BIOINFORMATICS, 2009, 10 : 263
  • [9] Development and validation of a machine-learning algorithm to predict the relevance of scientific articles in teratology
    de Vriesa, Loes C.
    Habets, Philippe C.
    van IJzendoorn, David G. P.
    Vinkers, Christiaan H.
    Otte, Willem M.
    Harmark, Linda
    [J]. NEUROTOXICOLOGY AND TERATOLOGY, 2022, 92
  • [10] Map the SMA protocol: a machine-learning based algorithm to predict therapeutic response in spinal muscular atrophy
    Coratti, G.
    Antonaci, L.
    Masciocchi, C.
    Marini, A.
    [J]. NEUROMUSCULAR DISORDERS, 2023, 33 : S89 - S89